On Tue, Jan 30, 2007 at 03:37:36PM -0800, Gary Hade wrote:
> On Tue, Jan 30, 2007 at 04:32:34PM +0900, Tejun Heo wrote:
> > Hello, Gary.
> > 
> > Gary Hade wrote:
> > >>> If they verify your fix (ie,
> > >>> GoVault sometimes take more than 150ms to transmit the first D2H Reg FIs
> > >>> after SRST), I'll push similar patch upstream.
> > >> Thanks.  If you think that changes to increase the delays are
> > >> the way to go (at least until we can find a better solution)
> > >> I can provide patches.
> > > 
> > > Tejun, 
> > > I haven't heard anything from you on this so I'm including a delay
> > > increase patch against 2.6.20-rc6 for the 'ata-piix' case below.  
> > > I hope that you, Jeff, and others find this acceptable.
> > 
> > Sorry about being unresponsive.  The thing is that the change adds
> > unnecessary 2 secs of delay to a lot of other normal device-not-present
> > cases, so I was hesitant to ack the patch.  I'll give it more thoughts
> > (and respond timely this time :-)
> 
> Thanks!  My followup was untimely so we're even. :-)
> 
> Some of my random thoughts:
> There does appear to be this invalid assumption that 0xFF status 
> always implies device-not-present.  The status register access 
> restrictions in ATA/ATAPI-7 V1 5.14.2 include the statement "The 
> contents of this register, except for BSY, shall be ignored when 
> BSY is set to one." which the code does not honor.  There is apparently 
> past experience that 0xFF status implies device-not-present for some
> controllers (the odd clowns :) but I have no idea how common these are.
> We obviously can't get rid of the check but since we cannot clear
> the read-only status register and there appears to be no specification 
> dictated upper limit on how long it should take for a software reset to 
> complete it just seems like we need to wait long enough to support the 
> slowest known device which may be the GoVault.
> 
> > 
> > > With respect to the 'ahci' case w/2.6.20-rc6 the GoVault device is 
> > > useable following boot although the below messages are being logged 
> > > during initialization.  Please let me know if you have any thoughts 
> > > on this.  
> > >   scsi1 : ahci
> > >   ata2: softreset failed (port busy but CLO unavailable)
> > >   ata2: softreset failed, retrying in 5 secs
> > >   ata2: port is slow to respond, please be patient (Status 0x80)
> > >   ata2: port failed to respond (30 secs, Status 0x80)
> > >   ata2: COMRESET failed (device not ready)
> > >   ata2: hardreset failed, retrying in 5 secs
> > >   ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> > >   ata2.00: ATAPI, max UDMA/66
> > >   ata2.00: configured for UDMA/66
> > 
> > The above should have been fixed in 2.6.20-rc6.  Please test it.  It was
> > caused by the ahci driver incorrectly clearing ahci CAP register and
> > fixed recently.
> 
> I'm clearly seeing this with 2.6.20-rc6 but unlike the ata-piix
> issue it does not appear to be dependent on the port to which the
> device is attached.  I've been playing around with this today and
> found that it could be solved by inserting a delay between the 
> ahci_stop_engine() call and BSY/DRQ check.
> 
> This change:
> --- linux-2.6.20-rc6/drivers/ata/ahci.c.orig  2007-01-30 11:01:20.000000000 
> -0800
> +++ linux-2.6.20-rc6/drivers/ata/ahci.c       2007-01-30 12:59:38.000000000 
> -0800
> @@ -804,6 +804,19 @@ static int ahci_softreset(struct ata_por
>               goto fail_restart;
>       }
> 
> +     {
> +             int delay;
> +             u8 stat;
> +             for (delay = 0; delay < 2000; delay+=100) {
> +                     if (!(ahci_check_status(ap) & (ATA_BUSY | ATA_DRQ)))
> +                             break;
> +                     msleep(100);
> +                     stat = ahci_check_status(ap);
> +                     ata_port_printk(ap, KERN_INFO, "delay=%d BSY=%d 
> DRQ=%d\n",
> +                             delay, (stat & ATA_BUSY)?1:0, (stat & 
> ATA_DRQ)?1:0);
> +             }
> +     }
> +
>       /* check BUSY/DRQ, perform Command List Override if necessary */
>       if (ahci_check_status(ap) & (ATA_BUSY | ATA_DRQ)) {
>               rc = ahci_clo(ap);
> 
> Yielded this output both with and without the RDC inserted:
> scsi1 : ahci
> ata2: delay=0 BSY=1 DRQ=0
> ata2: delay=100 BSY=1 DRQ=0
> ata2: delay=200 BSY=1 DRQ=0
> ata2: delay=300 BSY=1 DRQ=0
> ata2: delay=400 BSY=1 DRQ=0
> ata2: delay=500 BSY=1 DRQ=0
> ata2: delay=600 BSY=1 DRQ=0
> ata2: delay=700 BSY=1 DRQ=0
> ata2: delay=800 BSY=1 DRQ=0
> ata2: delay=900 BSY=1 DRQ=0
> ata2: delay=1000 BSY=1 DRQ=0
> ata2: delay=1100 BSY=1 DRQ=0
> ata2: delay=1200 BSY=0 DRQ=0
> ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
> ata2.00: ATAPI, max UDMA/66
> ata2.00: configured for UDMA/66
> 
> So it appears that we may also have a similar device slowness issue 
> with this driver.

Tejun,
I instrumented the code and found that for the SATA hard drive BSY was set 
just before the call to ahci_init_port() from ahci_port_start() and clear 
after the return from ahci_init_port().  For the GoVault BSY was still set 
after the return from ahci_init_port() and remained set for almost 2 seconds.

The below patch which gives BSY some extra time to clear repairs the problem.  
Unlike the extra delay for ata-piix needed by GoVault I believe this delay 
will only be seen for attached devices that need it.  Please let me know 
what you think.  

Thanks.

Gary

-- 
Gary Hade
System x Enablement
IBM Linux Technology Center
503-578-4503  IBM T/L: 775-4503
[EMAIL PROTECTED]
http://www.ibm.com/linux/ltc


We encountered a problem where the BSY status bit is still 
set on entry to the 'ahci' error handler during initialization
of the Quantum GoVault when attached to an ICH6R/ICH6RW controller.
This caused a software reset failure due to failed BSY/DRQ check
forcing a hard reset with the following messages logged.
  ata1: softreset failed (port busy but CLO unavailable)
  ata1: softreset failed, retrying in 5 secs
  ata1: port is slow to respond, please be patient (Status 0x80)
  ata1: port failed to respond (30 secs, Status 0x80)
  ata1: COMRESET failed (device not ready)
  ata1: hardreset failed, retrying in 5 secs

It was taking almost 2 seconds for BSY to clear following the
return from ahci_init_port() in ahci_port_start() so this patch
gives BSY up to 3 seconds extra time to clear eliminating the
problem.

Signed-off-by: Gary Hade <[EMAIL PROTECTED]>

--- linux-2.6.20-rc7/drivers/ata/ahci.c.orig    2007-02-16 10:11:21.000000000 
-0800
+++ linux-2.6.20-rc7/drivers/ata/ahci.c 2007-02-16 13:23:04.000000000 -0800
@@ -1423,6 +1423,8 @@ static int ahci_port_start(struct ata_po
        void *mem;
        dma_addr_t mem_dma;
        int rc;
+       u8 status;
+       unsigned long timeout;
 
        pp = kmalloc(sizeof(*pp), GFP_KERNEL);
        if (!pp)
@@ -1477,6 +1479,17 @@ static int ahci_port_start(struct ata_po
        /* initialize port */
        ahci_init_port(port_mmio, hpriv->cap, pp->cmd_slot_dma, pp->rx_fis_dma);
 
+       status = ahci_check_status(ap);
+
+       /* for some devices we need to delay to allow BSY to clear */
+       if (status & ATA_BUSY) {
+               timeout = jiffies + 3*HZ;
+               while ((status & ATA_BUSY) && time_before(jiffies, timeout)) {
+                       msleep(50);
+                       status = ahci_check_status(ap);
+               }
+       }
+
        return 0;
 }
 
-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to