RE: PowerEdge 1950 ECC question...

2010-02-03 Thread Henrik Schmiediche

The RAM passes Memtest86+ v4.0. I know it is strange.

  - Henrik

-Original Message-
From: linux-poweredge-boun...@dell.com
[mailto:linux-poweredge-boun...@dell.com] On Behalf Of Hostmaster
Sent: Wednesday, February 03, 2010 10:27 AM
To: linux-powere...@lists.us.dell.com
Subject: RE: PowerEdge 1950 ECC question...

It might have to be trial-and-error then to work out which stick(s) is/are
faulty, but you might want to give memtest86 a look.

Regards,
Richard

-Original Message-
From: linux-poweredge-boun...@dell.com
[mailto:linux-poweredge-boun...@dell.com]
On Behalf Of Henrik Schmiediche
Posted At: 03 February 2010 16:21
Posted To: Hostmaster
Conversation: PowerEdge 1950 ECC question...
Subject: RE: PowerEdge 1950 ECC question...


I cannot get the node to startup to the point where OMSA runs. It freezes on
startup with the bad RAM.

  - Henrik

-Original Message-
From: Ryan Miller [mailto:rmil...@smartertravelmedia.com] 
Sent: Wednesday, February 03, 2010 10:19 AM
To: Henrik Schmiediche; linux-powere...@lists.us.dell.com
Subject: RE: PowerEdge 1950 ECC question...

Are you running OMSA?  It usually is able to pinpoint which stick is the
cause of single bit errors.  It can be helpful to stress the RAM with some
heavy compiles/DMA to make the bad stick throw an SBE sooner.

 -Original Message-
 From: linux-poweredge-boun...@dell.com [mailto:linux-poweredge-
 boun...@dell.com] On Behalf Of Henrik Schmiediche
 Sent: Wednesday, February 03, 2010 11:09 AM
 To: linux-powere...@lists.us.dell.com
 Subject: PowerEdge 1950 ECC question...
 
  Hello,
 Is it possible to turn ECC off on the PowerEdge 1950? I have servers
 from
 other manufacturer where this is possible, but I cannot find this
 option in
 the PE 1950. I'd like to test the memory with ECC turned off.
 
 Here is Background:
 
 I have a PowerEdge 1950 (one of over 100+ identical systems) that
 freezes on
 startup. I can reimage the node fine, the freeze on startup persists.
 The
 node passes Dell Diagnostics including overnight memory testing.
 Nothing in
 the ESM log.
 
 I reseated the RAM and other components. No solution.
 
 On a lark I decided to change all 8 memory sticks.. this solved the
 problem!
 The system starts up fine.
 
 So it seems there is a bad memory module, but I have no idea which
 ones. I
 am trying to avoid the one-by-one (or batch-by-batch) testing method
 and I
 was thinking that (maybe) turning ECC off might help locate the bad
 module.
 
 Any ideas?
 
   - Henrik
 
 
 ___
 Linux-PowerEdge mailing list
 Linux-PowerEdge@dell.com
 https://lists.us.dell.com/mailman/listinfo/linux-poweredge
 Please read the FAQ at http://lists.us.dell.com/faq

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq
 
 
All E-Mail communications are monitored in addition to being content checked
for malicious codes or viruses. The success of scanning products is not
guaranteed, therefore the recipient(s) should carry out any checks that they
believe to be appropriate in this respect.
 
This message (including any attachments and/or related materials) is
confidential to and is the property of Computer Service Centre, unless
otherwise noted. If you are not the intended recipient, you should delete
this message and are hereby notified that any disclosure, copying, or
distribution of this message, or the taking of any action based on it, is
strictly prohibited.
 
Any views or opinions presented are solely those of the author and do not
necessarily represent those of Computer Service Centre.

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


RE: PowerEdge 1950 ECC question...

2010-02-03 Thread Henrik Schmiediche
Did that. The system is up using new RAM and OMSA and related utilities are
running. There are no memory related entries in the ESM log. Old ram freezes
system, but no error of any kind is generated in ESM, memtest, mpmemory,
dell diags.

   -Henrik

-Original Message-
From: linux-poweredge-boun...@dell.com
[mailto:linux-poweredge-boun...@dell.com] On Behalf Of Tino Schwarze
Sent: Wednesday, February 03, 2010 10:30 AM
To: linux-poweredge@dell.com; linux-powere...@lists.us.dell.com
Subject: Re: PowerEdge 1950 ECC question...

On Wed, Feb 03, 2010 at 10:21:10AM -0600, Henrik Schmiediche wrote:

 I cannot get the node to startup to the point where OMSA runs. It freezes
on
 startup with the bad RAM.

Just take out all RAM (and note which DIMM was in which slot), insert
known-good RAM; then have a look at the ESM log. It will tell you where
the errors occured (exact DIMM location). Then move all other RAM in,
and perform a extensive memory test.

HTH,

Tino.

   - Henrik
 
 -Original Message-
 From: Ryan Miller [mailto:rmil...@smartertravelmedia.com] 
 Sent: Wednesday, February 03, 2010 10:19 AM
 To: Henrik Schmiediche; linux-powere...@lists.us.dell.com
 Subject: RE: PowerEdge 1950 ECC question...
 
 Are you running OMSA?  It usually is able to pinpoint which stick is the
 cause of single bit errors.  It can be helpful to stress the RAM with some
 heavy compiles/DMA to make the bad stick throw an SBE sooner.
 
  -Original Message-
  From: linux-poweredge-boun...@dell.com [mailto:linux-poweredge-
  boun...@dell.com] On Behalf Of Henrik Schmiediche
  Sent: Wednesday, February 03, 2010 11:09 AM
  To: linux-powere...@lists.us.dell.com
  Subject: PowerEdge 1950 ECC question...
  
   Hello,
  Is it possible to turn ECC off on the PowerEdge 1950? I have servers
  from
  other manufacturer where this is possible, but I cannot find this
  option in
  the PE 1950. I'd like to test the memory with ECC turned off.
  
  Here is Background:
  
  I have a PowerEdge 1950 (one of over 100+ identical systems) that
  freezes on
  startup. I can reimage the node fine, the freeze on startup persists.
  The
  node passes Dell Diagnostics including overnight memory testing.
  Nothing in
  the ESM log.
  
  I reseated the RAM and other components. No solution.
  
  On a lark I decided to change all 8 memory sticks.. this solved the
  problem!
  The system starts up fine.
  
  So it seems there is a bad memory module, but I have no idea which
  ones. I
  am trying to avoid the one-by-one (or batch-by-batch) testing method
  and I
  was thinking that (maybe) turning ECC off might help locate the bad
  module.
  
  Any ideas?
  
- Henrik
  
  
  ___
  Linux-PowerEdge mailing list
  Linux-PowerEdge@dell.com
  https://lists.us.dell.com/mailman/listinfo/linux-poweredge
  Please read the FAQ at http://lists.us.dell.com/faq
 
 ___
 Linux-PowerEdge mailing list
 Linux-PowerEdge@dell.com
 https://lists.us.dell.com/mailman/listinfo/linux-poweredge
 Please read the FAQ at http://lists.us.dell.com/faq

-- 
What we nourish flourishes. - Was wir nähren erblüht.

www.lichtkreis-chemnitz.de
www.tisc.de

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


Re: PowerEdge 1950 ECC question...

2010-02-03 Thread Tim Small
It seems probable that the memory fault is causing the BIOS to crash 
before it gets a chance to enable ECC - thus no errors are logged.  It 
could also be a bus-loading issue with the FB-DIMMs (such that no memory 
can be issued - maybe a faulty AMB chip on one of the sticks).

Try the system with half the original RAM in at a time - you could also 
try moving each stick up by four slots (and wrapping round).

Tim.


Henrik Schmiediche wrote:
 Did that. The system is up using new RAM and OMSA and related utilities are
 running. There are no memory related entries in the ESM log. Old ram freezes
 system, but no error of any kind is generated in ESM, memtest, mpmemory,
 dell diags.
   


-- 
South East Open Source Solutions Limited
Registered in England and Wales with company number 06134732.  
Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ
VAT number: 900 6633 53  http://seoss.co.uk/ +44-(0)1273-808309

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq


RE: PowerEdge 1950 ECC question...

2010-02-03 Thread Henrik Schmiediche

Hmm... the parts in the Dell where Dell parts, the replacement parts (that 
work) are non-Dell.  Before the RAM swap fixed the problem I thought the 
problem might be mainboard as well (and it still may be). I am not sure how I 
will proceed, but you are right in think this may be a voltage/main board 
issue. My original question (turning of ECC for memory testing) was hopefully 
going to narrow down the issue.

 - Henrik


-Original Message-
From: linux-poweredge-boun...@dell.com 
[mailto:linux-poweredge-boun...@dell.com] On Behalf Of Hostmaster
Sent: Wednesday, February 03, 2010 10:44 AM
To: linux-powere...@lists.us.dell.com
Subject: RE: PowerEdge 1950 ECC question...

Was the replacement RAM you used absolutely identical down to the part numbers
and number of sticks inserted? I had a very bizarre problem a year or two ago
with similar behaviour in non-Dell hardware and it turned out to be the
mainboard. The system would lock up in certain scenarios, however swap the RAM
out (four sticks down to two, lower FSB) and it was perfectly stable. The
original RAM in an identical mainboard in a different system was also stable. I
put it down to VRM/power issues on the board, and a replacement mainboard solved
the problem.

Just a thought..

Richard

-Original Message-
From: linux-poweredge-boun...@dell.com [mailto:linux-poweredge-boun...@dell.com]
On Behalf Of Henrik Schmiediche
Posted At: 03 February 2010 16:33
Posted To: Hostmaster
Conversation: PowerEdge 1950 ECC question...
Subject: RE: PowerEdge 1950 ECC question...

Did that. The system is up using new RAM and OMSA and related utilities are
running. There are no memory related entries in the ESM log. Old ram freezes
system, but no error of any kind is generated in ESM, memtest, mpmemory,
dell diags.

   -Henrik

-Original Message-
From: linux-poweredge-boun...@dell.com
[mailto:linux-poweredge-boun...@dell.com] On Behalf Of Tino Schwarze
Sent: Wednesday, February 03, 2010 10:30 AM
To: linux-poweredge@dell.com; linux-powere...@lists.us.dell.com
Subject: Re: PowerEdge 1950 ECC question...

On Wed, Feb 03, 2010 at 10:21:10AM -0600, Henrik Schmiediche wrote:

 I cannot get the node to startup to the point where OMSA runs. It freezes
on
 startup with the bad RAM.

Just take out all RAM (and note which DIMM was in which slot), insert
known-good RAM; then have a look at the ESM log. It will tell you where
the errors occured (exact DIMM location). Then move all other RAM in,
and perform a extensive memory test.

HTH,

Tino.

   - Henrik
 
 -Original Message-
 From: Ryan Miller [mailto:rmil...@smartertravelmedia.com] 
 Sent: Wednesday, February 03, 2010 10:19 AM
 To: Henrik Schmiediche; linux-powere...@lists.us.dell.com
 Subject: RE: PowerEdge 1950 ECC question...
 
 Are you running OMSA?  It usually is able to pinpoint which stick is the
 cause of single bit errors.  It can be helpful to stress the RAM with some
 heavy compiles/DMA to make the bad stick throw an SBE sooner.
 
  -Original Message-
  From: linux-poweredge-boun...@dell.com [mailto:linux-poweredge-
  boun...@dell.com] On Behalf Of Henrik Schmiediche
  Sent: Wednesday, February 03, 2010 11:09 AM
  To: linux-powere...@lists.us.dell.com
  Subject: PowerEdge 1950 ECC question...
  
   Hello,
  Is it possible to turn ECC off on the PowerEdge 1950? I have servers
  from
  other manufacturer where this is possible, but I cannot find this
  option in
  the PE 1950. I'd like to test the memory with ECC turned off.
  
  Here is Background:
  
  I have a PowerEdge 1950 (one of over 100+ identical systems) that
  freezes on
  startup. I can reimage the node fine, the freeze on startup persists.
  The
  node passes Dell Diagnostics including overnight memory testing.
  Nothing in
  the ESM log.
  
  I reseated the RAM and other components. No solution.
  
  On a lark I decided to change all 8 memory sticks.. this solved the
  problem!
  The system starts up fine.
  
  So it seems there is a bad memory module, but I have no idea which
  ones. I
  am trying to avoid the one-by-one (or batch-by-batch) testing method
  and I
  was thinking that (maybe) turning ECC off might help locate the bad
  module.
  
  Any ideas?
  
- Henrik
  
  
  ___
  Linux-PowerEdge mailing list
  Linux-PowerEdge@dell.com
  https://lists.us.dell.com/mailman/listinfo/linux-poweredge
  Please read the FAQ at http://lists.us.dell.com/faq
 
 ___
 Linux-PowerEdge mailing list
 Linux-PowerEdge@dell.com
 https://lists.us.dell.com/mailman/listinfo/linux-poweredge
 Please read the FAQ at http://lists.us.dell.com/faq

-- 
What we nourish flourishes. - Was wir nähren erblüht.

www.lichtkreis-chemnitz.de
www.tisc.de

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http

Re: PowerEdge 1950 ECC question...

2010-02-03 Thread Tim Small
Henrik Schmiediche wrote:
 My original question (turning of ECC for memory testing) was hopefully going 
 to narrow down the issue.
   

It's probably possible to disable ECC after boot time using setpci (I've
used it to read and write the ECC status registers  on Intel chipsets in
the past), but I don't know the details of the i5000 ECC implementation
(I don't even know if the ECC functionality is still controlled via PCI
Configuration space) - you'll have to check the datasheet (or the
i5000_edac driver source).

ISTR, memtest86 and memtest86+ both had some functionality for
reading/writing ECC status bits on some chipsets as well, so you could
hack on these too (but the code was a bit messed up last time I looked -
I think ECC no, and ECC NO had different meanings!)...

Tim.

-- 
South East Open Source Solutions Limited
Registered in England and Wales with company number 06134732.  
Registered Office: 2 Powell Gardens, Redhill, Surrey, RH1 1TQ
VAT number: 900 6633 53  http://seoss.co.uk/ +44-(0)1273-808309

___
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq