Jens Axboe wrote:
On Wed, Jul 06 2005, Jens Axboe wrote:

On Mon, Jul 04 2005, Jens Axboe wrote:

On Fri, Jul 01 2005, Jens Axboe wrote:

I converted most of debug messages I've used during development into
warning messages when posting the patchset and forgot about it, so
I've never posted the debug patch.  Sorry about that.  Here's a small
patch which adds some more messages though.  The following patch also
adds printk'ing FIS on each command issue in ahci.c:ahci_qc_issue(),
if you think it would fill your log excessively, feel free to turn it
off.  It wouldn't probably matter anyway.

I will have to kill the issue part of the patch, that would generate
insane amounts of printk traffic :-)

I'll boot the kernel and report what happens.

It triggered last night, but the old kernel was booted. This was the
log:

ahci ata1: stat=d0, issuing COMRESET
ata1: recovering from error
ata1: status=0x01 { Error }
ata1: error=0x80 { Sector }
SCSI error : <0 0 0 0> return code = 0x8000002
sda: Current: sense key=0x3
   ASC=0x11 ASCQ=0x4
end_request: I/O error, dev sda, sector 66255899
Buffer I/O error on device sda2, logical block 8018923
lost page write due to I/O error on sda2
ata1: status=0x01 { Error }
ata1: error=0x80 { Sector }
SCSI error : <0 0 0 0> return code = 0x8000002
sda: Current: sense key=0x3
   ASC=0x11 ASCQ=0x4
end_request: I/O error, dev sda, sector 66239043
Buffer I/O error on device sda2, logical block 8016816
lost page write due to I/O error on sda2
ata1: recovering from error
ata1: status=0x01 { Error }
ata1: error=0x80 { Sector }
SCSI error : <0 0 0 0> return code = 0x8000002
sda: Current: sense key=0x3
   ASC=0x11 ASCQ=0x4
end_request: I/O error, dev sda, sector 66239051
Buffer I/O error on device sda2, logical block 8016817
lost page write due to I/O error on sda2
ata1: status=0x01 { Error }
ata1: error=0x80 { Sector }
SCSI error : <0 0 0 0> return code = 0x8000002
sda: Current: sense key=0x3
   ASC=0x11 ASCQ=0x4
end_request: I/O error, dev sda, sector 35137043

This is with the extra debug. Given that it is the timeout triggering,
only the sstatus is new.

ahci ata1: stat=d0, issuing COMRESET
ata1: started resetting...
ata1: end resetting, sstatus=00000113
ata1: recovering from error
ata1: status=0x01 { Error }
ata1: error=0x80 { Sector }
SCSI error : <0 0 0 0> return code = 0x8000002
sda: Current: sense key=0x3
   ASC=0x11 ASCQ=0x4
end_request: I/O error, dev sda, sector 66190875
Buffer I/O error on device sda2, logical block 8010795
lost page write due to I/O error on sda2
ata1: status=0x01 { Error }
ata1: error=0x80 { Sector }
SCSI error : <0 0 0 0> return code = 0x8000002
sda: Current: sense key=0x3
   ASC=0x11 ASCQ=0x4
end_request: I/O error, dev sda, sector 66159699
Buffer I/O error on device sda2, logical block 8006898
lost page write due to I/O error on sda2


btw, the reason it hangs here (I suspect) is that your read_log_page()
logic is wrong - not every error condition will have NCQ_FAILED set
before entering ncq_recover. The timeout will not, for instance.
Testing... As usual, this will take days.


I thought log page 10h would be valid only after the drive reported error during NCQ processing. That's why it doesn't read log page on timeouts. Hmmm, maybe we should read log page 10h on any NCQ failure but discard the result on timeout. Please let me know how your testing goes.

 Thanks. :-)

--
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to