On 9/7/06, Robert Citek <[EMAIL PROTECTED]> wrote:
> Hello all,
>
> We've got a Dell server (PowerEdge 1750) that's been working pretty
> well for us.  We've decided to install Ubuntu on it, but before we
> did so we figured we'd run a quick Memtest86 (v1.65).  These are some
> of the lines from the memtest:
>
> Tst  Pass  Failing Address         Good        Bad     Err-bits
> Count Chan
> ---  ----  ----------------------  --------  --------  --------
> ----- ----
> 1    0     0007fffdc80 - 2047.8MB  7fffdc80  00002685  7ffffa05      1
> 1    1     0007fffdc80 - 2047.8MB  7fffdc80  000032f1  7fffee71      1
> 1    2     0007fffdc80 - 2047.8MB  7fffdc80  00003f5d  7fffe3dd      1
>


Form [1]
    Tst: Test Number

    Failing Address: Failing memory address

    Good: Expected data pattern

    Bad: Failing data pattern

    Err-Bits: Exclusive or of good and bad data (this shows the
position of the failing bit(s))

    Count: Number of consecutive errors with the same address and failing bits


So I'd say at address 7fffdc80, it failed setting itself to its own
address, 3 times, each time with a different return . 0000268 and so
on. That portion of the chip or the addressing controller chip is bad.

If you run it at different times, say after powered down for 12 hours,
then when powered up for at least 1 hour, do you get the same results.
If you get different results that aren't even in the same bank's
address range, it is likely to be the controller chip, otherwise the
bank.


The test it ran was: (from [1] again)

Test 1 [Address test, own address]

    Each address is written with its own address and then is checked
for consistency. In theory previous tests should have caught any
memory addressing problems. This test should catch any addressing
errors that somehow were not previously detected.

>
> We also ran Dell's own memory diagnostic, which didn't produce any
> errors.
>

Just putting on my conspiracy hat here, but doesn't Dell benefit as a
company if their testing system finds fewer errors (i.e. their quality
must be higher?)

I'd sooner trust another testing software first. Can you move the
banks around? Do you have another Dell server to swap with?


> The questions I have are:
>
> 1) does are machine have problems with RAM or not?
Probably, esp. if repeated test runs under diff conditions (uptime,
heat, cosmic rays exposure) still produce errors.
> 2) what do the errors that Memtest86 output mean?
See above
> 3) why is there a discrepancy between Dell's test and Memtest86?
Dell probably sucks more at this, maybe on purpose.


> Regards,
> - Robert

[1] http://www.memtest86.com/#display

-- 
Ed Howland
http://greenprogrammer.blogspot.com

 
_______________________________________________
CWE-LUG mailing list
[email protected]
http://www.cwelug.org/
http://www.cwelug.org/archives/
http://www.cwelug.org/mailinglist/

Reply via email to