Today, I had another SCSI failure. I was able to get a bit more of dmesg
stuff, but can't figure out, what is going wrong there.
In /var/log/messages, the unusuall stuff starts with this repeated a
couple of times:
Mar 28 12:00:45 mail kernel: (scsi0:0:2:0) Parity error during Message-In phase
Mar 28 12:00:45 mail kernel: (scsi0:0:2:0) Parity error during Data-In phase.
It goes on to a lot of messages similar to this (pid, id and stuff right
from 'lun 0' is changing):
Mar 28 12:00:45 mail kernel: scsi : aborting command due to timeout : pid 14301024,
scsi0, channel 0, id 0, lun 0 Write (10) 00 00 6b 0f 14 00 00 08 00
Then this (a lot of lines):
Mar 28 12:00:45 mail kernel: SCSI host 0 abort (pid 14301062) timed out - resetting
Mar 28 12:00:45 mail kernel: SCSI bus is being reset for host 0 channel 0.
Somewhere in between this shows up:
Mar 28 12:00:45 mail kernel: (scsi0:0:2:0) Performing Domain validation.
Then this:
Mar 28 12:00:45 mail kernel: SCSI host 0 reset (pid 14301061) timed out again -
Mar 28 12:00:45 mail kernel: probably an unrecoverable SCSI bus or device hang.
And finally this:
Mar 28 12:00:45 mail kernel: (scsi0:0:2:0) Successfully completed Domain validation.
Mar 28 12:00:45 mail kernel: (scsi0:0:2:0) Using asynchronous transfers.
Mar 28 12:00:45 mail kernel: (scsi0:0:1:0) Synchronous at 80.0 Mbyte/sec, offse 31.
Mar 28 12:00:45 mail kernel: (scsi0:0:0:0) Using asynchronous transfers.
followed by some more liens of previous messages. This are the last
entries I got in /var/log/messages before rebooting (hard). The machine
was sortof alive (ie. ping, httpd, php3...), but I was unable to login
(even locally). The one console I had open was able to do 'ls', 'free',
'dmesg', things doing anything with hard disk froze up. Even 'shutdown'
and 'reboot' failed to execute.
The weird thing is that all of these messages occured in a single second
(12:00:45).
I'm asking if someone with more SCSI experience could diagnose what could
be the cause of that?
Thanks, D.
PS: More info about the machine:
CPU: Dual P-III 500 MHz
Board: Intel L440GX
Disks: 4x IBM DNES-309170Y (3 RAID5 + 1 spare)
LAN: Integrated Inte EtherExpress Pro 10/100
cat /proc/interrupts
CPU0 CPU1
0: 253546 252241 IO-APIC-edge timer
1: 99 103 IO-APIC-edge keyboard
2: 0 0 XT-PIC cascade
4: 473 472 IO-APIC-edge serial
8: 0 0 IO-APIC-edge rtc
13: 1 0 XT-PIC fpu
19: 358370 359232 IO-APIC-level aic7xxx, aic7xxx
21: 225846 225239 IO-APIC-level Intel EtherExpress Pro 10/100 Ethernet
cat /proc/ioports
0000-001f : dma1
0020-003f : pic1
0040-005f : timer
0060-006f : keyboard
0070-007f : rtc
0080-008f : dma page reg
00a0-00bf : pic2
00c0-00df : dma2
00f0-00ff : fpu
03c0-03df : vga+
03f8-03ff : serial(auto)
1080-109f : Intel Speedo3 Ethernet
1400-14be : aic7xxx
1800-18be : aic7xxx
uname -a
Linux my.host.name 2.2.13 #1 SMP Tue Mar 14 11:55:56 CET 2000 i686 unknown