[zfs-discuss] hardware going bad

2010-10-27 Thread Harry Putnam
It seems my hardware is getting bad, and I can't keep the os running for more than a few minutes until the machine shuts down. It will run 15 or 20 minutes and then shutdown I haven't found the exact reason for it. Or really any thing in logs that seems like a reason. It may be because I don't

Re: [zfs-discuss] hardware going bad

2010-10-27 Thread Toby Thain
On 27/10/10 3:14 PM, Harry Putnam wrote: It seems my hardware is getting bad, and I can't keep the os running for more than a few minutes until the machine shuts down. It will run 15 or 20 minutes and then shutdown I haven't found the exact reason for it. One thing to try is a thorough

Re: [zfs-discuss] hardware going bad

2010-10-27 Thread Harry Putnam
Toby Thain t...@telegraphics.com.au writes: On 27/10/10 3:14 PM, Harry Putnam wrote: It seems my hardware is getting bad, and I can't keep the os running for more than a few minutes until the machine shuts down. It will run 15 or 20 minutes and then shutdown I haven't found the exact

Re: [zfs-discuss] hardware going bad

2010-10-27 Thread Krunal Desai
I believe he meant a memory stress test, i.e. booting with a memtest86+ CD and seeing if it passed. Even if the memory is OK, the stress from that test may expose defects in the power supply or other components. Your CPU temperature is 56C, which is not out-of-line for most modern CPUs (you

Re: [zfs-discuss] hardware going bad

2010-10-27 Thread Toby Thain
On 27/10/10 4:21 PM, Krunal Desai wrote: I believe he meant a memory stress test, i.e. booting with a memtest86+ CD and seeing if it passed. Correct. The POST tests are not adequate. --Toby Even if the memory is OK, the stress from that test may expose defects in the power supply or other

Re: [zfs-discuss] hardware going bad

2010-10-27 Thread Harry Putnam
Krunal Desai mov...@gmail.com writes: I believe he meant a memory stress test, i.e. booting with a memtest86+ CD and seeing if it passed. Even if the memory is OK, the stress from that test may expose defects in the power supply or other components. Your CPU temperature is 56C, which is not

Re: [zfs-discuss] hardware going bad

2010-10-27 Thread Harry Putnam
Toby Thain t...@telegraphics.com.au writes: On 27/10/10 4:21 PM, Krunal Desai wrote: I believe he meant a memory stress test, i.e. booting with a memtest86+ CD and seeing if it passed. Correct. The POST tests are not adequate. Got it. Thank you. Short of doing such a test, I have

Re: [zfs-discuss] hardware going bad

2010-10-27 Thread Krunal Desai
With an A64, I think a thermal shutdown would instantly halt CPU execution, removing the chance to write any kind of log message. memtest will report any errors in RAM; perhaps when the ARC expands to the upper-stick of memory it hits the bad bytes and crashes. Can you try switching power

Re: [zfs-discuss] hardware going bad

2010-10-27 Thread Glenn Lagasse
* Harry Putnam (rea...@newsguy.com) wrote: Toby Thain t...@telegraphics.com.au writes: On 27/10/10 4:21 PM, Krunal Desai wrote: I believe he meant a memory stress test, i.e. booting with a memtest86+ CD and seeing if it passed. Correct. The POST tests are not adequate. Got it.

Re: [zfs-discuss] hardware going bad

2010-10-27 Thread Harry Putnam
Krunal Desai mov...@gmail.com writes: With an A64, I think a thermal shutdown would instantly halt CPU execution, removing the chance to write any kind of log message. memtest will report any errors in RAM; perhaps when the ARC expands to the upper-stick of memory it hits the bad bytes and

Re: [zfs-discuss] hardware going bad

2010-10-27 Thread Bob Friesenhahn
On Wed, 27 Oct 2010, Harry Putnam wrote: I have been having some trouble with corrupted data in one pool but I thought I'd gotten it cleared up and posted to that effect in another thread. zpool status on all pools shows thumbs up. What are some key words I should be looking for in

Re: [zfs-discuss] hardware going bad

2010-10-27 Thread Peter Jeremy
On 2010-Oct-28 04:45:16 +0800, Harry Putnam rea...@newsguy.com wrote: Short of doing such a test, I have evidence already that machine will predictably shutdown after 15 to 20 minutes of uptime. My initial guess is thermal issues. Check that the fans are running correctly and there's no

Re: [zfs-discuss] hardware going bad

2010-10-27 Thread Harry Putnam
Peter Jeremy peter.jer...@alcatel-lucent.com writes: It seems there ought to be something, some kind of evidence and clues if I only knew how to look for them, in the logs. Serious hardware problems are unlikely to be in the logs because the system will die before it can write the error to

Re: [zfs-discuss] hardware going bad

2010-10-27 Thread Mike Gerdts
On Wed, Oct 27, 2010 at 3:41 PM, Harry Putnam rea...@newsguy.com wrote: I'm guessing it was probably more like 60 to 62 c under load.  The temperature I posted was after something like 5minutes of being totally shutdown and the case been open for a long while. (mnths if not yrs) What happens

Re: [zfs-discuss] hardware going bad

2010-10-27 Thread Harry Putnam
Mike Gerdts mger...@gmail.com writes: [...] Thanks for suggestions and I have closed it all up to see if there was a difference. Perhaps this belongs somewhere other than zfs-discuss - it has nothing to do with zfs. Yes... it does, It started out much nearer to belonging here. Not sure now