> On Sat, 15 Jan 2005 16:49:06 -0600 (CST),
> Russ Anderson <[EMAIL PROTECTED]> wrote:
> >The MCA recovery driver saves addresses memory errors
> >in an array. The array has 32 entries. The effect is
> >that after 32 recoveries, the driver stops recovering.
> >
> >This patch removes the page_isolate array. Since the array
> >was only used to see if the page is already marked reserved,
> >check the reserved bit instead of the array.
>
> lkcd dumps kernel pages marked reserved, so lkcd will try to process
> isolated pages. We will eventually need to add a new page flag to mark
> faulty pages.
Probably any other dump mechanism should be aware of bad HW pages as well,
so we might be better off to add a flag right away. While we are at it I
would propose to have actually two flags :
- hard error (which will cause a MCA and should be skipped when taking
a system dump)
- soft error (page has encountered SBE, so we might want to avoid future
allocation, but it can be dumped without causing an MCA)
Then we need to define some API that this information can be saved accross
a reboot, so we don't have to have another process hitting the UCE to find
out that there really is a problem at that location
Thanks
Matthias Fouquet-Lapar Core Platform Software [EMAIL PROTECTED] VNET
521-8213
Principal Engineer Silicon Graphics Home Office (+33) 1 3047 4127
-
To unsubscribe from this list: send the line "unsubscribe linux-ia64" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html