Hi,
I think I've found a stability problem in the SCSI subsystem of actual
kernel 2.4.2.
The setup:
Dual-Pentium2 512MB RAM
onboard Adaptec Controller AIC7895 (Dual Channel)
3 internal disk IBM DDRS-39130D
1 external disk easyRAID II with 360GB net capacity
The easyRAID II is a RAID system that works with IDE-Disk and has an
SCSI interface to the outside world. This RAIDs SCSI interface is a
Symbios 53C895.
Moreover the system contains 2 network cards and a graphics card.
I use SCSI low-level driver for AIC-7xxx built into kernel.
I have updated all the necessary tools as described in the
Documentation/Changes file.
gcc-2.91.66
binutils-2.10.1
util-linux-2.10s
e2fsprogs-1.19
The problem occurs when I try to create a file system (or copying lots
of files to the existing filesystem) on the internal disk or the
external RAID after a while there a lot of read error on the disks
reported, and the system becomes unusable, and mostly the external RAID
crashes.
After a reboot of the machine and resetting the RAID box, everything
seems to be fine, and also there are no disk error. I can run three or
more badblocks scans in parallel and all sectors are OK, but when I now
try to create a file system (ext2 or reiserfs) the same stuff as
described above happens.
Then I test another setup with
Dual-Pentium3 with 1GB RAM and Adaptec 29160 SCSI-controller and the
same disks and the RAID box connected, but the same errors came up, when
I try to write large amounts of data, disk reads causes no problems.
Little writes to the disk, i.e. editing a file and saving it, also
causes no errors (in both described setups).
The problem is definitly away when going back to 2.2.18.
Perhaps it is a similar problem like with kernel 2.2.13 that doesn't
recognize IBM DDYS-T18350 disk, but 2.2.18 do.
Thanks for advertence.
With best regards
Sebastian Woelk
Systemadministrator - Werner-Seelenbinder-Schule Berlin (Germany)
Below I've pasted some messages from my system log.
Mar 3 01:31:30 tenkei kernel: (scsi1:0:12:-1) Unexpected busfree,
LASTPHASE = 0x0, SEQADDR = 0x110
Mar 3 01:31:36 tenkei kernel: (scsi1:0:12:-1) Unexpected busfree,
LASTPHASE = 0x0, SEQADDR = 0x58
Mar 3 01:31:48 tenkei last message repeated 2 times
Mar 3 01:31:54 tenkei kernel: (scsi1:0:12:-1) Unexpected busfree,
LASTPHASE = 0x0, SEQADDR = 0x59
Mar 3 01:31:55 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 11 9c c4 50 00 04 00
00
Mar 3 01:31:55 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 11 9c c8 50 00 04 00
00
Mar 3 01:31:55 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 11 9c cc 50 00 04 00
00
Mar 3 01:31:55 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 11 9c d0 50 00 04 00
00
Mar 3 01:31:55 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 11 9c d4 50 00 02 e0
00
Mar 3 01:31:55 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 0b eb c0 00 00 00 08
00
Mar 3 01:31:55 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 0b f3 c0 30 00 00 08
00
Mar 3 01:31:55 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 0b fb c0 30 00 00 08
00
Mar 3 01:32:15 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 0b ff c0 38 00 00 08
00
Mar 3 01:32:23 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 0c 13 c0 30 00 00 08
00
Mar 3 01:32:23 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 0c 2f c0 30 00 00 08
00
Mar 3 01:32:23 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 0c 33 c0 30 00 00 10
00
Mar 3 01:33:02 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 0c 37 c0 30 00 00 08
00
Mar 3 01:33:16 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 0c 3b c0 30 00 00 08
00
Mar 3 01:33:27 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 0c 3f c0 30 00 00 08
00
Mar 3 01:33:38 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 0c 43 c0 30 00 00 08
00
Mar 3 01:33:48 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 0c 47 c0 30 00 00 08
00
Mar 3 01:33:58 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 0c 4b c0 30 00 00 08
00
Mar 3 01:34:09 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 0c 53 c0 30 00 00 08
00
Mar 3 01:34:18 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 0c 5b c0 30 00 00 08
00
Mar 3 01:34:26 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 0c 5f c0 30 00 00 10
00
Mar 3 01:34:35 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 0c 63 c0 30 00 00 08
00
Mar 3 01:34:42 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 0c 67 c0 30 00 00 08
00
Mar 3 01:34:50 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 0c 6b c0 30 00 00 10
00
Mar 3 01:34:58 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 0c 6f c0 30 00 00 10
00
Mar 3 01:35:05 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 0c 73 c0 30 00 00 08
00
Mar 3 01:35:13 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 0c 77 c0 30 00 00 38
00
Mar 3 01:35:20 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 0c cb c0 50 00 00 08
00
Mar 3 01:35:27 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 0c cb c0 68 00 00 08
00
Mar 3 01:35:34 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 0c cb c0 80 00 00 18
00
Mar 3 01:35:41 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 0c cb c0 c8 00 00 08
00
Mar 3 01:35:48 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 0c cb c0 e8 00 00 08
00
Mar 3 01:35:55 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 0c f7 c0 30 00 00 20
00
Mar 3 01:36:02 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 0d 1f c0 38 00 00 08
00
Mar 3 01:36:08 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 0d 6b c0 38 00 00 30
00
Mar 3 01:36:15 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 0d 6b c0 78 00 00 08
00
Mar 3 01:36:21 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 12, lun 0 Write (10) 00 0d 6b c0 a8 00 00 30
00
Mar 3 01:36:28 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 0, lun 0 Write (10) 00 00 21 57 f5 00 00 02
00
Mar 3 01:36:35 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 0, lun 0 Write (10) 00 00 22 d7 f3 00 00 04
00
Mar 3 01:36:41 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 0, lun 0 Write (10) 00 00 22 d7 fd 00 00 02
00
Mar 3 01:36:48 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 0, lun 0 Write (10) 00 00 23 97 d1 00 00 02
00
Mar 3 01:36:54 tenkei kernel: scsi : aborting command due to timeout :
pid 0, scsi1, channel 0, id 0, lun 0 Write (10) 00 00 23 d7 d1/O error:
dev 08:32, sector Mar 3 01:37:02 tenkei kernel: (scsi1:0:0:0)
Synchronous at 40.0 Mbyte/sec, offset 8.
Mar 3 01:37:08 tenkei kernel: SCSI disk error : host 1 channel 0 id 12
lun 0 return code = 26030000
Mar 3 01:37:15 tenkei kernel: I/O error: dev 08:32, sector 95483032
Mar 3 01:37:21 tenkei kernel: SCSI disk error : host 1 channel 0 id 12
lun 0 return code = 26030000
Mar 3 01:37:27 tenkei kernel: I/O error: dev 08:32, sector 39321896
Mar 3 01:37:33 tenkei kernel: scsi1 channel 0 : resetting for second
half of retries.
Mar 3 01:37:40 tenkei kernel: SCSI bus is being reset for host 1
channel 0.
Mar 3 01:37:46 tenkei kernel: SCSI host 1 channel 0 reset (pid 0) timed
out - trying harder
Mar 3 01:37:53 tenkei kernel: SCSI bus is being reset for host 1
channel 0.
Mar 3 01:38:01 tenkei kernel: SCSI host 1 reset (pid 0) timed out again
-
Mar 3 01:38:07 tenkei kernel: probably an unrecoverable SCSI bus or
device hang.
Mar 3 01:38:14 tenkei kernel: SCSI disk error : host 1 channel 0 id 12
lun 0 return code = 26030000
Mar 3 01:38:20 tenkei kernel: I/O error: dev 08:32, sector 95480984
Mar 3 01:38:27 tenkei kernel: SCSI disk error : host 1 channel 0 id 12
lun 0 return code = 26030000
Mar 3 01:38:33 tenkei kernel: I/O error: dev 08:32, sector 95484056
Mar 3 01:38:40 tenkei kernel: (scsi1:0:0:0) Synchronous at 40.0
Mbyte/sec, offset 8.
Mar 3 01:38:47 tenkei kernel: SCSI disk error : host 1 channel 0 id 12
lun 0 return code = 26030000
Mar 3 01:38:54 tenkei kernel: I/O error: dev 08:32, sector 60031272
Mar 3 01:39:00 tenkei kernel: SCSI disk error : host 1 channel 0 id 12
lun 0 return code = 26030000
Mar 3 01:39:07 tenkei kernel: I/O error: dev 08:32, sector 95489176
Mar 3 01:39:13 tenkei kernel: SCSI disk error : host 1 channel 0 id 12
lun 0 return code = 26030000
Mar 3 01:39:20 tenkei kernel: I/O error: dev 08:32, sector 95485080
Mar 3 01:39:26 tenkei kernel: SCSI disk error : host 1 channel 0 id 12
lun 0 return code = 26030000
Mar 3 01:39:34 tenkei kernel: I/O error: dev 08:32, sector 95490200
Mar 3 01:39:40 tenkei kernel: SCSI disk error : host 1 channel 0 id 12
lun 0 return code = 26030000
Mar 3 01:39:47 tenkei kernel: I/O error: dev 08:32, sector 95487128
Mar 3 01:39:47 tenkei kernel: SCSI disk error : host 1 channel 0 id 12
lun 0 return code = 26030000
Mar 3 01:39:55 tenkei kernel: I/O error: dev 08:32, sector 95482008
Mar 3 01:40:01 tenkei kernel: SCSI disk error : host 1 channel 0 id 12
lun 0 return code = 26030000
Mar 3 01:40:07 tenkei kernel: I/O error: dev 08:32, sector 92536992
Mar 3 01:40:14 tenkei kernel: SCSI disk error : host 1 channel 0 id 12
lun 0 return code = 26030000
Mar 3 01:40:20 tenkei kernel: I/O error: dev 08:32, sector 95486104
Mar 3 01:40:27 tenkei kernel: SCSI disk error : host 1 channel 0 id 12
lun 0 return code = 26030000
Mar 3 01:40:33 tenkei kernel: I/O error: dev 08:32, sector 95488152
Mar 3 01:40:40 tenkei kernel: SCSI disk error : host 1 channel 0 id 12
lun 0 return code = 26030000
Mar 3 01:40:46 tenkei kernel: I/O error: dev 08:32, sector 95483040
Mar 3 01:40:53 tenkei kernel: scsi1 channel 0 : resetting for second
half of retries.
Mar 3 01:41:00 tenkei kernel: SCSI bus is being reset for host 1
channel 0.
Mar 3 01:41:06 tenkei kernel: SCSI disk error : host 1 channel 0 id 12
lun 0 return code = 26030000
Mar 3 01:41:13 tenkei kernel: I/O error: dev 08:32, sector 95491224
Mar 3 01:41:19 tenkei kernel: (scsi1:0:0:0) Synchronous at 40.0
Mbyte/sec, offset 8.
Mar 3 01:41:25 tenkei kernel: SCSI disk error : host 1 channel 0 id 12
lun 0 return code = 26030000
Mar 3 01:41:32 tenkei kernel: I/O error: dev 08:32, sector 39321904
Mar 3 01:41:38 tenkei kernel: SCSI disk error : host 1 channel 0 id 12
lun 0 return code = 26030000
Mar 3 01:41:51 tenkei kernel: I/O error: dev 08:32, sector 95480992
Mar 3 01:41:52 tenkei kernel: SCSI disk error : host 1 channel 0 id 12
lun 0 return code = 26030000
Mar 3 01:42:06 tenkei kernel: I/O error: dev 08:32, sector 60031280
Mar 3 01:42:12 tenkei kernel: SCSI disk error : host 1 channel 0 id 12
lun 0 return code = 26030000
Mar 3 01:42:19 tenkei kernel: I/O error: dev 08:32, sector 95489184
Mar 3 01:42:25 tenkei kernel: scsi1 channel 0 : resetting for second
half of retries.
Mar 3 01:42:32 tenkei kernel: SCSI bus is being reset for host 1
channel 0.
Mar 3 01:42:38 tenkei kernel: SCSI host 1 channel 0 reset (pid 0) timed
out - trying harder
Mar 3 01:42:45 tenkei kernel: SCSI bus is being reset for host 1
channel 0.
Mar 3 01:42:52 tenkei kernel: SCSI host 1 reset (pid 0) timed out again
-
Mar 3 01:42:58 tenkei kernel: probably an unrecoverable SCSI bus or
device hang.
Mar 3 01:43:05 tenkei kernel: SCSI disk error : host 1 channel 0 id 12
lun 0 return code = 26030000
Mar 3 01:43:12 tenkei kernel: I/O error: dev 08:32, sector 95484064
Mar 3 01:43:19 tenkei kernel: (scsi1:0:0:0) Synchronous at 40.0
Mbyte/sec, offset 8.
Mar 3 01:43:26 tenkei kernel: SCSI disk error : host 1 channel 0 id 12
lun 0 return code = 26030000
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]