On 9/7/06, Robert Citek <[EMAIL PROTECTED]> wrote:
> Hello all,
>
> We've got a Dell server (PowerEdge 1750) that's been working pretty
> well for us. We've decided to install Ubuntu on it, but before we
> did so we figured we'd run a quick Memtest86 (v1.65). These are some
> of the lines from the memtest:
>
> Tst Pass Failing Address Good Bad Err-bits
> Count Chan
> --- ---- ---------------------- -------- -------- --------
> ----- ----
> 1 0 0007fffdc80 - 2047.8MB 7fffdc80 00002685 7ffffa05 1
> 1 1 0007fffdc80 - 2047.8MB 7fffdc80 000032f1 7fffee71 1
> 1 2 0007fffdc80 - 2047.8MB 7fffdc80 00003f5d 7fffe3dd 1
>
Form [1]
Tst: Test Number
Failing Address: Failing memory address
Good: Expected data pattern
Bad: Failing data pattern
Err-Bits: Exclusive or of good and bad data (this shows the
position of the failing bit(s))
Count: Number of consecutive errors with the same address and failing bits
So I'd say at address 7fffdc80, it failed setting itself to its own
address, 3 times, each time with a different return . 0000268 and so
on. That portion of the chip or the addressing controller chip is bad.
If you run it at different times, say after powered down for 12 hours,
then when powered up for at least 1 hour, do you get the same results.
If you get different results that aren't even in the same bank's
address range, it is likely to be the controller chip, otherwise the
bank.
The test it ran was: (from [1] again)
Test 1 [Address test, own address]
Each address is written with its own address and then is checked
for consistency. In theory previous tests should have caught any
memory addressing problems. This test should catch any addressing
errors that somehow were not previously detected.
>
> We also ran Dell's own memory diagnostic, which didn't produce any
> errors.
>
Just putting on my conspiracy hat here, but doesn't Dell benefit as a
company if their testing system finds fewer errors (i.e. their quality
must be higher?)
I'd sooner trust another testing software first. Can you move the
banks around? Do you have another Dell server to swap with?
> The questions I have are:
>
> 1) does are machine have problems with RAM or not?
Probably, esp. if repeated test runs under diff conditions (uptime,
heat, cosmic rays exposure) still produce errors.
> 2) what do the errors that Memtest86 output mean?
See above
> 3) why is there a discrepancy between Dell's test and Memtest86?
Dell probably sucks more at this, maybe on purpose.
> Regards,
> - Robert
[1] http://www.memtest86.com/#display
--
Ed Howland
http://greenprogrammer.blogspot.com
_______________________________________________
CWE-LUG mailing list
[email protected]
http://www.cwelug.org/
http://www.cwelug.org/archives/
http://www.cwelug.org/mailinglist/