AHCI and device timeout

Emmanuel Dreyfus Thu, 20 Oct 2016 06:05:35 -0700

Hello

On a Xen/i386 I observe the kernel gets a lot of device tiemouts on 
SATA disks:


wd1a: device timeout reading fsbn 460630240 of 460630240-460630271 (wd1 bn 
460630303; cn 456974 tn 8 sn 7), retrying
ahcisata0 port 3: device present, speed: 3.0Gb/s
wd1: soft error (corrected)

That impacts performances alot since no I/O can happen during such an
event. Here is the relevant dmesg stuff:

ahcisata0 at pci0 dev 31 function 2: vendor 0x8086 product 0x1d02 (rev. 0x06)
ahcisata0: interrupting at ioapic0 pin 18, event channel 5
ahcisata0: AHCI revision 1.30, 6 ports, 32 slots, CAP 
0xe730ff45<EMS,PSC,SSC,PMD,ISS=0x3=Gen3,SCLO,SAL,SALP,SSNTF,SNCQ,S64A>
atabus0 at ahcisata0 channel 0
(...)
ahcisata0 port 3: device present, speed: 3.0Gb/s
(...)
wd1 at atabus3 drive 0
wd1: <WDC WD5000AADS-00S9B0>
wd1: drive supports 16-sector PIO transfers, LBA48 addressing
wd1: 465 GB, 969021 cyl, 16 head, 63 sec, 512 bytes/sect x 976773168 sectors
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd1(ahcisata0:3:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) 
(using DMA)

If the BIOS is configured without AHCI, the disk was handled by 
the piixide driver and it had similar troubles:

piixide0:1:0: lost interrupt
      type: ata tc_bcount: 512 tc_skip: 0
piixide0:1:0: intr with DRQ (st=0x58)
wd1a: device timeout writing fsbn 606429632 of 606429632-606429663 (wd1 bn 
606429695; cn 601616 tn 12 sn 11), retrying
wd1: soft error (corrected)

The disk itslef has been changed multiple times without improvement. The
machine has been swapped for a similar model, hence a specific hardware
fault seems unlikely.

And the oddest point: logs show the problem appears on january 25th 2016.
Logs between september 22nd 2015 and that date show no problems.

Does it rings a bell to anyone? 
-- 
Emmanuel Dreyfus
m...@netbsd.org

AHCI and device timeout

Reply via email to