Hi, what's the difference between a standard kernel and a kernel that comes as a Debian package?
I'm using a standard kernel, but I'm having problems with one of my disks (see below). The disk "gets lost" every now and then, i. e. it seems to take a couple days or weeks now (I've seen it taking as long as about two months with the old board) before it happens. The disk remains unavailable until I turn the power off and back on. Once the disk is back, I can re-add the partitions on the failed disk to the md devices, and they are being rebuilt just fine, and it works for some time until the disk "gets lost" again. This problem isn't new; it has been there with another board/CPU/RAM, cables and power supply ever since I got the two SATA disks new. It's been there with every standard kernel I tried over the years, with i368, and now it's the same with amd64. I've been thinking it was a problem of the board I had, but as it's there with another board etc., it must be either the disk itself or the SATA driver. Googling revealed that this isn't a rare problem. There are people reporting it with all kinds of different disks and boards and different distributions. Some suggest that it's a problem with the PSU or the SATA cables, but imho that's unlikely. Interestingly, it seems to be more common for this problem to show up in RAID setups. Also interestingly, mdadm did *not* detect the disk failure for /dev/md2 which is mounted read only. And even more interestingly, the problem is and has always been with /dev/sdb, never with /dev/sda. I can't tell if the disks have been swapped when I connected them to the new board, though. But I'd rule out a problem with the firmware of the disk as well since both disks use the same firmware version. So is there a difference between Debian and standard kernels so that I might not have this problem if I'd use a Debian kernel? Has this problem been solved in some way yet? I might get another two disks, but I'm afraid that the same problem would come up with other disks as well ... Info: cat:/home/lee# uname -a Linux cat 2.6.27.7-cat-smp #4 SMP Thu Dec 4 16:03:29 CST 2008 x86_64 GNU/Linux cat:/home/lee# smartctl -i /dev/sda smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Model Family: Maxtor MaXLine III family (SATA/300) Device Model: Maxtor 7V300F0 Serial Number: V604E3FG Firmware Version: VA111630 User Capacity: 300,090,728,448 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0 Local Time is: Wed Dec 10 15:00:04 2008 CST SMART support is: Available - device has SMART capability. SMART support is: Enabled cat:/home/lee# smartctl -i /dev/sdb smartctl version 5.38 [x86_64-unknown-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Model Family: Maxtor MaXLine III family (SATA/300) Device Model: Maxtor 7V300F0 Serial Number: V601T7VG Firmware Version: VA111630 User Capacity: 300,090,728,448 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 7 ATA Standard is: ATA/ATAPI-7 T13 1532D revision 0 Local Time is: Wed Dec 10 15:00:42 2008 CST SMART support is: Available - device has SMART capability. SMART support is: Enabled cat:/home/lee# lspci [...] 00:1f.2 SATA controller: Intel Corporation 82801IB (ICH9) 4 port SATA AHCI Controller (rev 02) syslog: Dec 10 00:09:10 cat kernel: ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen Dec 10 00:09:10 cat kernel: ata5.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0 Dec 10 00:09:10 cat kernel: res 40/00:00:00:4f:c2/00:00:00:c2:00/00 Emask 0x4 (timeout) Dec 10 00:09:10 cat kernel: ata5.00: status: { DRDY } Dec 10 00:09:10 cat kernel: ata5: hard resetting link Dec 10 00:09:10 cat kernel: ata5: SATA link down (SStatus 0 SControl 300) Dec 10 00:09:15 cat kernel: ata5: hard resetting link Dec 10 00:09:16 cat kernel: ata5: SATA link down (SStatus 0 SControl 300) Dec 10 00:09:21 cat kernel: ata5: hard resetting link Dec 10 00:09:21 cat kernel: ata5: SATA link down (SStatus 0 SControl 300) Dec 10 00:09:21 cat kernel: ata5.00: disabled Dec 10 00:09:21 cat kernel: end_request: I/O error, dev sdb, sector 478543967 Dec 10 00:09:21 cat kernel: md: super_written gets error=-5, uptodate=0 Dec 10 00:09:21 cat kernel: raid1: Disk failure on sdb2, disabling device. Dec 10 00:09:21 cat kernel: raid1: Operation continuing on 1 devices. Dec 10 00:09:21 cat kernel: ata5: EH complete Dec 10 00:09:21 cat kernel: ata5.00: detaching (SCSI 4:0:0:0) Dec 10 00:09:21 cat kernel: sd 4:0:0:0: [sdb] Synchronizing SCSI cache Dec 10 00:09:21 cat kernel: sd 4:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00 Dec 10 00:09:21 cat kernel: sd 4:0:0:0: [sdb] Stopping disk Dec 10 00:09:21 cat kernel: sd 4:0:0:0: [sdb] START_STOP FAILED Dec 10 00:09:21 cat kernel: sd 4:0:0:0: [sdb] Result: hostbyte=0x04 driverbyte=0x00 Dec 10 00:09:21 cat kernel: RAID1 conf printout: Dec 10 00:09:21 cat kernel: --- wd:1 rd:2 Dec 10 00:09:21 cat kernel: disk 0, wo:0, o:1, dev:sda2 Dec 10 00:09:21 cat kernel: disk 1, wo:1, o:0, dev:sdb2 Dec 10 00:09:21 cat kernel: RAID1 conf printout: Dec 10 00:09:21 cat kernel: --- wd:1 rd:2 Dec 10 00:09:21 cat kernel: disk 0, wo:0, o:1, dev:sda2 Dec 10 00:09:21 cat kernel: scsi 4:0:0:0: rejecting I/O to dead device Dec 10 00:09:21 cat kernel: scsi 4:0:0:0: rejecting I/O to dead device Dec 10 00:09:21 cat kernel: end_request: I/O error, dev sdb, sector 146496512 Dec 10 00:09:21 cat kernel: md: super_written gets error=-5, uptodate=0 Dec 10 00:09:21 cat kernel: raid1: Disk failure on sdb1, disabling device. Dec 10 00:09:21 cat kernel: raid1: Operation continuing on 1 devices. Dec 10 00:09:21 cat mdadm[1995]: Fail event detected on md device /dev/md1, component device /dev/sdb2 Dec 10 00:09:21 cat kernel: RAID1 conf printout: Dec 10 00:09:21 cat kernel: --- wd:1 rd:2 Dec 10 00:09:21 cat kernel: disk 0, wo:0, o:1, dev:sda1 Dec 10 00:09:21 cat kernel: disk 1, wo:1, o:0, dev:sdb1 Dec 10 00:09:21 cat kernel: RAID1 conf printout: Dec 10 00:09:21 cat kernel: --- wd:1 rd:2 Dec 10 00:09:21 cat kernel: disk 0, wo:0, o:1, dev:sda1 Dec 10 00:10:21 cat mdadm[1995]: Fail event detected on md device /dev/md0 -- "Don't let them, daddy. Don't let the stars run down." http://adin.dyndns.org/adin/TheLastQ.htm -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]