Paul B. Henson wrote:
On Thu, 17 Sep 2009, Steve Hanson wrote:
There is a brief period after a reboot (prior to the fmd daemon
restarting) where the page can be used. However once fmd starts up it
will immediately re-retire the page.
Ok, so it maintains state. The page will continue to be retired until at
some point the entire DIMM is marked faulty (if enough failures occur to do
that), presumably replaced, and when the fault is marked as resolved the
pages will no longer be retired? What happens if the memory is swapped out,
for example a memory upgrade? Will it notice that the DIMM has changed and
reset the fault information? Do DIMMs have serial numbers? Hypothetically
if the DIMM is swapped out for the exact same model (for whatever reason)
will fm know that it is a different DIMM and reset the fault state?
Yes, DIMM's do have serial numbers and FMA is able to use that serial number to
detect DIMM replacement and affect an automatic repair of memory faults on most
platforms - with the definition of "most platforms" being Intel and Sparc
platforms and a subset of AMD platforms. I wrote about how some of this works
in the following blog entry:
http://blogs.sun.com/robj/entry/fma_and_dimm_serial_numbers
rob
_______________________________________________
fm-discuss mailing list
fm-discuss@opensolaris.org