Hi, Setting up and testing my new system (after wasting nearly 1 month with bad RAM modules), I got this error today:
[48055.741389] ata3.00: exception Emask 0x0 SAct 0x2 SErr 0x0 action 0x6 frozen [48055.741393] ata3.00: failed command: READ FPDMA QUEUED [48055.741398] ata3.00: cmd 60/20:08:38:15:03/01:00:18:00:00/40 tag 1 ncq 147456 in [48055.741400] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) [48055.741402] ata3.00: status: { DRDY } [48055.741405] ata3: hard resetting link [48056.198746] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) [48056.210514] ata3.00: configured for UDMA/133 [48056.210518] ata3.00: device reported invalid CHS sector 0 [48056.210523] ata3: EH complete I really don't understand what it means, but the "timeout", "hard resetting link" and "invalid CHS sector 0" look scary to me... Initial bootup messages for this device were: Mar 25 22:02:32 [kernel] [ 4.496102] ata3: SATA max UDMA/133 abar m2...@0xfbffc000 port 0xfbffc200 irq 34 Mar 25 22:02:32 [kernel] [ 8.519169] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Mar 25 22:02:32 [kernel] [ 8.536681] ata3.00: ATA-8: SAMSUNG HD203WI, 1AN10002, max UDMA/133 Mar 25 22:02:32 [kernel] [ 8.548388] ata3.00: 3907029168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA Mar 25 22:02:32 [kernel] [ 8.566100] ata3.00: configured for UDMA/133 That disk is part of a md RAID5, but I was at work when this error happened so I didn't notice if the RAID repaired itself or whatever would happen in this case (I don't have mdadm monitoring configured yet). Right now all RAID disks are all up and healthy. I googled it but most of the results are pastebin snippets. I'm using kernel 2.6.33 and ahci driver for the SATA controllers. >From libata documentation in the section about timeout errors it says: "Most often this is due to an unrelated interrupt subsystem bug (try booting with 'pci=nomsi' or 'acpi=off' or 'noapic'), which failed to deliver an interrupt when we were expecting one from the hardware." I really don't know the potential implications of disabling MSI or APIC, but in /proc/interrupts I do see AHCI related to both MSI and APIC rows. So at least I know they are active right now. Temperatures in my system are good, hddtemp says the drive in question is 21C degrees right now. Another possibility is that I need to increase voltage on the motherboard, since it is running 6 hdd's and 1 DVD-ROM. I'll have to research to see which voltage is related to this. (X58 motherboard) Thanks in advance if anyone has any knowledge about this, otherwise I go to trial-and-hopefully-no-error mode. :) Paul