I don't have any good ideas here - are you saying that with the logging
enabled the timeouts don't occur?  The implication to me is that the
TEST_UNIT_READY command is taking a bit longer to complete than the timeout
value for the command.  With the logging, it finishes before the timeout
hits.

    You could play with the timeout for TEST_UNIT_READY and see whether this
changes matters.

    Someday we need to come up with a way whereby individual drivers have
more control over the amount of time a command is expected to run.  There
has to be a sort of negotiation between the upper level driver which may
have reason to expect that a command will complete in a certain amount of
time, and the low-level driver which may have reasons of it's own for
believing that the thing will take longer than expected.

-Eric

----- Original Message -----
From: "Kenn Humborg" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Friday, May 12, 2000 8:40 PM
Subject: Re: DEC RZ55 and Advansys ABP3925


> On Tue, May 09, 2000 at 11:54:28PM +0100, Kenn Humborg wrote:
> >
> > I've got a problem with two DEC RZ55 drives on an Advansys ABP3925
> > host adapter.
> >
> > However, if I load the module while the drives are powered down,
> > I get the following:
> >
> > May  9 21:51:19 avalon kernel: scsi : aborting command due to timeout :
> >    pid 49, scsi0, channel 0, id 1, lun 0 Test Unit Ready 00 00 00 00 00
> [...snip...]
> >
> > To get some more info, I compiled up 2.3.99-pre6 with scsi logging and
> > did echo "scsi log all" > /proc/scsi/scsi.  After doing this, a
> > modprobe advansys generated loads of info, but worked fine without
> > any delays.
> >
> > This suggests to me that the issue is timing related.  So, I want to
> > try enabling various subsets of the logging types until I see which
> > one 'cures' the timeout problem.
>
> OK, I've played around with this some more and here are a few more
> clues.  These tests were performed with the 3.3A driver from
> ftp.advansys.com on 2.3.99-pre6.  (I had to comment out a few
> printk()'s that don't compile when ADVANSYS_DEBUG is defined.)
>
> The hardware setup is
>
>    ABP3925 adapter.  Nothing connected to internal bus connector.
>    Termination set to Enabled in BIOS.
>
>    1metre cable to DEC BA42A disk enclosure.  This box holds
>    two RZ55 disks.  The internal twisted-pair ribbon cable in
>    this box is about 60cm long.  The two IDC headers on the cable
>    are about 30cm apart.  This cable is the original DEC cable.
>
>    The BA42A has two external Centronics connectors.  One goes
>    to the ABP3925 and the other is terminated with a DEC 50-pin
>    Centronics-style terminator.
>
>    The RZ55s were powered down for the duration of this testing.
>
> If I disconnect both drives, the driver loads OK (as expected).
>
> If I connect one drive (doesn't matter which one, I've tried both
> individually), the driver loads OK (as expected).
>
> If I disconnect the terminator, and leave both drives connected,
> the driver fails to load with
>
>    advansys: AscInitAsc1000Driver: board 0: error: init_state 13e, warn 0
error 8
>
> (not quite as expected, but understandable).
>
> If I connect both drives and the terminator, the driver takes
> about a minute to load.  (This time I don't get any error messages
> because we're using the new SCSI error handling, which isn't as
> chatty as the stuff in scsi_obsolete.c).
>
> Turning on logging by setting asc_dbglvl to 1 or 2 and by playing
> with echo "scsi log ..." > /proc/scsi/scsi caused the driver to
> load correctly again.
>
> So, I started to narrow down exactly how much needed to be logged
> to make it work.  I found that I could reduce it to this:
>
>     ASC_DBG(1, "advansys_interrupt: end\n");
> ->  printk("xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
> ->         "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\n");
> ->  printk("xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
> ->         "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\n");
>     return;
>
> at the end of advansys_interrupt().
>
> I always killed klogd during the module load to eliminate the
> effect that disk I/O might have.  So, during the test, the log
> messages were only going to the console.  Also, those two
> printk()'s were only sufficient if the current cursor position
> on the console was at the bottom, so that the console had to
> scroll while printing.  I imagine that different machine with
> different CPU speeds and graphics cards will require different
> amounts of log output here to 'fix' the problem.
>
> So, this is my theory:
>
>    The additional console output delays handling of the next
>    interrupt slightly, thus allowing time for _something_ in
>    the card status to change before the ISR deals with the
>    interrupt.
>
> So, let's try adding a small delay to the start of the ISR:
>
>     ASC_DBG(1, "advansys_interrupt: begin\n");
>
> ->  mdelay(10);
>
>     /*
>      * Check for interrupts on all boards.
>      * AscISR() will call asc_isr_callback().
>      */
>
> This doesn't help.  The driver still goes into slow error-recovery
> for each SCSI device ID that it tries to scan for.
>
> So where do we go from here?  Does anyone need me to check anything
> else?  Want any more info?
>
> Someone else suggested that maybe my host adapter is not supplying
> termination power.  Unfortunately, I haven't been able to bring
> home a voltmeter to check this.  In any case, dodgy termination
> shouldn't lead to this kind of slow error recovery (with interrupts
> being locked out for 500ms during part of it), should it?
>
> Later,
> Kenn
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to [EMAIL PROTECTED]
>


-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]

Reply via email to