Thanks for the insightful info. Yes, as another
user had suggested privately, I was running
memtest86 since pretty much my post last night
(early morning).

Thus far 16 passes, running almost 17 hours and
no errors.  Although, I know, and as you pointed
out, no errors doesn't really rule out bad memory
module(s).

I'm going to try swapping out modules, maybe I'll
get lucky.



--- Marcus Watts <[EMAIL PROTECTED]> wrote:

> > I've not see this type of problem before, so I
> > turn to you guys.  Is this a sign that maybe
> > a drive is going bad?  Or sign of bad memory?
> > 
> > What's going on here!?  I know it is almost
> > Halloween and all, but this is kinda _spooky_
> > to say the least.
> > 
> > 
> > Idea? Please? :-)
> 
> Hard drives contain lots of moving parts, a known reliability risk.
> Therefore most if not all modern hard disks and associated logic
> contain more or less elaborate internal self-checking logic to detect
> failing media, failing spindle motor, failing head positioning
> mechanism, over and under voltage, bus driver failure, etc.  Most of
> these will result in kernel messages and/or other obvious signs of
> system distress.  Your "dmesg" (assuming it was done after the failed
> build) doesn't show any evidence of such problem, so there's no reason
> to suspect a hard disk going bad.
> 
> More likely possibilities are bad memory, a bad motherboard,
> incompatible memory, bad disk controller, mis-configured bus speeds,
> environmental problem, or possibly but less likely, a bad cpu.  Memory
> is simple: if you buy a "consumer grade" home machine, you get memory
> that has no self-check logic.  A chip going bad could well produce the
> problems you show below.  A "server class" machine will nearly always
> contain ECC memory.  A few companies (Dell, Sun) also make "commercial
> grade" desktop machines, which usually also contain ECC.  Note that
> most "home computer" stores and even many professionals don't understand
> or value ECC memory, and will steer you away from such technology.
> 
> If it's memory, even without self-check logic that may still be easy to
> see if it's broken.  "memcheck86+" has a good reputation.  This is a
> stand-alone program, which you can leave running overnight.  If it
> fails memcheck86+, then the problem is obvious.  If it passes, the
> memory is still not in the clear; for instance, it's in theory possible
> for the memory to fail when accessed by DMA but not by the processor.
> If you can get the memory to fail more or less predictably, and you
> have multiple memory modules, you may be able to play remove & swap
> games to identify which module is bad.  Check your hardward doc first -
> on some systems, modules may need to be paired in some particular
> fashion.
> 
> It is certainly worth checking your machine for obvious physical
> problems.  For instance, check air paths to ensure they aren't
> blocked.  Be suspicious of burning smells, obvious heat, excessive fan
> noise, or lack of distinct air flow.  Check the inside of the machine.
> Is there excessive dust build-up?  Are the fan blades clean?  Do the
> fans spin very smoothly and fairly freely?  Are the cables in the way?
> Are there any loose cables?  Loose boards?  Bad solder joints or
> cracks?  (On most modern motherboards, it's not worth spending much
> time checking this if it's not easy to get to; removing the motherboard
> may itself cause damage, and even a "large" crack sufficient to produce
> complete failure may be nearly impossible to spot).  Other signs of
> physical distress?  Ideally you want your machine to be in a
> climate-controlled environment comfortable to people.  Dust, very dry
> air, excessive moisture, temperature cycles, etc. are all bad.
> Electrically conductive dust can become particularly exciting.
> 
> An older or fancier machine may have a separate disk controller, in
> which case if you have a spare it may be worth swapping.  Your machine
> is probably not one of these.
> 
> On many newer machines, the BIOS can contain settings which alter the
> speed or timing of various bus components.  Getting this wrong can
> produce subtle weirdness, or obvious and drammatic signs of failure.
> It may take a while for subtle weirdness to manifest itself in any
> obvious fashion.  If you have ECC memory, make sure the bios knows that.
> 
> Sorting all this out can take time.  If the machine is an older one, it
> may be cheaper to replace it than figure out what failed.
> 
> Also, in case you missed it, building large software packages is
> an excellent way to burn a new machines in or establish
> that an existing machine is reliable.  :-)
> 
>                               -Marcus


 
____________________________________________________________________________________
Get your email and see which of your friends are online - Right on the New 
Yahoo.com 
(http://www.yahoo.com/preview) 

Reply via email to