Summary: 
For those who didn't follow up the thread, I was investigating an error
message:
"Kernel panic - not syncing: Aiee, killing interrupt handler."  where
the computer comes to a complete freeze, the only thing that works is
the power switch.

The error appears only under heavy load like compiling.  This is a new
box Asus A8V, AMD64, 1Gb or RAM (PC3200 DDR400 Kingston RAM)  and Sata
200Gb
I was able to find out that "Aiee" is a hardware error, Intel has a nice
article about it:
http://resource.intel.com/telecom/support/tnotes/tnbyos/2000/tn062.htm

So following this lead I was looking and trying to pin-point hardware
error.
It took me one week to investigate trying different solutions like:

1.) I Run memtest86 first time, got some errors, so I run the same test
on individual sticks (I have 2 x 512Mb), the individual sticks passed
the test without errors.
I exchanged the sticks between two slots and run the memtest86 again
overnight.  The test completed 17-passes without any error.
So I excluded Memory as a culprit. 

2.) I disabled Network controller on the motherboard and installed
another
one on PCI bus - this eliminated possible IRQ conflict, the Sata Drive
on channel-0 was sharing an IRQ with Network controller.
But it didn't help.

3.) I removed the heatsink, cleaned it with 99% isopropyl alcohol and
applied a
thin layer of new heatsink grease.
Did not help.  But I still wanted to try  as per Robert C. suggestion:
"some arctic silver compound instead. It's good for a 3-5C. drop from
the regular stuff."  
Anyhow, I opened the box cover, and the temp. of the CPU dropped from
about 40C to about 35C / 36C so I decided to follow some other leads
first.

4.) I removed SATA drive and tried to install Gentoo on
standard IDE drive; this would eliminate SCSI problem and/or buggy
driver.
Did not help, I haven't had a chance to do a complete base installation
when I got the same error message:
"Kernel panic - not syncing: Aiee, killing interrupt handler."  

I got a lead from Francesco T.
''Sometimes memtest doesn't stress enough the hardware, see:
http://people.redhat.com/dledford/memtest.html 
..."

So it made me think again about the memory. I swapped the two sticks
with the two sticks from one of my Backup Server PC2100 2x512Mb

So I downloaded some linux source  kernel but it needs to be modified as
the Red Hat memtest.sh is looking for "linux" top-level directory not
some "linux-2.6.-something". 
Instead of modifying the script it is easier to just modify the
kernel-source (as per Richard F help):
tar -xzvf linux.tar.gz
mv linux-* linux
tar -czvf linux.tar.gz linux

and one more thing, change the first line of the script:
 #!/bin/bash2
to:
 #!/bin/bash

I run the RedHad memory test on my main server (different box 20-passes
standard script setup) and it went just fine.  It finished with an empty
line "no error" as weg-page suggest:
---quote----
How do you know if your memory passed?

Very simple. If you run that script from the command line on your
computer and it completes without ever spewing a single message onto
your screen, then you passed. If you get messages from diff about
differences between files or any other anomolies such as that, then you
failed.
---end quote-----

I run some compiling and did not receive any errors or kernel panic I
did run the RedHat memory test on the memory stick from my backup server
and it finished without spilling a single error message.

So, at this point I know the problem is the memory stick
I put back the original memory stick, the Sata Drive, and used the on
board Network controller.
I tried to run the RedHad memtest.sh it freeze with the same kernel
panic:
"Kernel panic - not syncing: Aiee, killing interrupt handler."

It appears that the test only made into fourth-round when it freeze.
It did not spill any message into the screen it just freeze with the
kernel panic as always.  So I wasn't 100% sure that this would qualify
as failed memory test: 
"...f you get messages from diff about differences between files or any
other anomolies such as that, then you failed."
But I suppose, it would qualify, you be the judge.

Anyhow, I replaced the pair of stick with two new once run memtest.sh
30-passes it passed without spilling single "error" on the the line,
clean finish.
I was able to emerge "kde-meta" and it finished without a single hiccup.

Thank you ALL for all your suggestions help, it appears another mystery
has been solved.
So my conclusion: Do not rely on memtest86  

-- 
#Joseph

On Sat, 2005-07-23 at 20:23 +0200, Richard Fish wrote:
> Joseph wrote:
> 
> >>[...]
> >>    
> >>
> >>>-bash: ./memtest.sh: /bin/bash2: bad interpreter: No such file or
> >>>directory
> >>>
> >>>On both boxes the I have bash-3.0 so what is it looking for?
> >>>      
> >>>
> >>Correct the first line of the script from "#!/bin/bash2" to 
> >>"#!/bin/bash" and everything will be fine.
> >>
> >>Ciao
> >>    Francesco
> >>    
> >>
> >
> >Thank you, yes that is what I did as soon as I posted the message.
> >Though it puzzle me whey it runs on my main server and not on the new
> >box?
> >
> >Ps. my mean server pass the memtest.sh without any errors, I'm only
> >curious the  result of that bad rum sticks that pass memtest86 on the
> >new box.  I will re-run both test and post the results.
> >  
> >
> 
> My guess is still that if you relax the memory timings in the BIOS, the 
> "bad" RAM will start to work fine.  Of course, *I* would still return it 
> and get RAM that actually performs to the specs on the box, but that's 
> just me! :->
> 
> -Richard
> 
> 

-- 
gentoo-user@gentoo.org mailing list

Reply via email to