Doug Cutting wrote:
Michael Stack has some experience tracking down problems with flaky
memory. Michael, did you use a test program to validate the memory on
a node?
One of the lads at the Archive used to run CTCS,
http://sourceforge.net/projects/va-ctcs/. It was good for weeding out
bad hardware. But we also found that machines that passed multiple CTCS
burnins could continue to throw checksum errors (These were non-ECC
machines).
St.Ack
P.S. Pardon the tardy reply. Have been offline for last couple of weeks.