On Wed, Feb 19, 2003 at 10:20:12AM +1100, Bruce Evans wrote:
> On Tue, 18 Feb 2003, Ruslan Ermilov wrote:
> 
> > On Fri, Feb 14, 2003 at 05:10:40AM -0800, Alfred Perlstein wrote:
> > > alfred      2003/02/14 05:10:40 PST
> > >
> > >   Modified files:
> > >     sys/kern             kern_intr.c
> > >     sys/dev/ata          ata-all.c
> > >   Log:
> > >   Fix crash dumps on ata and scsi.
> > >
> > [...]
> > >   To fix ata, use what appears to be a polling method if we're dumping,
> > >   I stole this from tmm but added code to ensure that this change is
> > >   only in effect while dumping.
> > >
> > >   Tested by: des
> > >
> > FWIW, if I propagate this change to the !dumping case, it also
> > fixes the ``resume stucks in "ata1: resetting devices .."'' bug
> > I was having with my ThinkPad 600X:
> >
> > %%%
> > Index: ata-all.c
> > ===================================================================
> > RCS file: /home/ncvs/src/sys/dev/ata/ata-all.c,v
> > retrieving revision 1.165
> > diff -u -p -r1.165 ata-all.c
> > --- ata-all.c       14 Feb 2003 13:10:40 -0000      1.165
> > +++ ata-all.c       18 Feb 2003 10:08:22 -0000
> > @@ -486,8 +486,7 @@ ata_getparam(struct ata_device *atadev,
> >
> >      /* apparently some devices needs this repeated */
> >      do {
> > -   if (ata_command(atadev, command, 0, 0, 0,
> > -           dumping ? ATA_WAIT_READY : ATA_WAIT_INTR)) {
> > +   if (ata_command(atadev, command, 0, 0, 0, ATA_WAIT_READY)) {
> >         ata_prtdev(atadev, "%s identify failed\n",
> >                    command == ATA_C_ATAPI_IDENTIFY ? "ATAPI" : "ATA");
> >         free(ata_parm, M_ATA);
> > %%%
> 
> There is, or was, something near here that made the whole system go
> unresponsive (as seen by nfs clients) for several seconds.  I guess
> the main problem was just using polled mode in all cases here.  In
> RELENG_4, polling is done at splbio() so normally only disk devices
> are blocked, but under -current almost everything is blocked by Giant.
> 
The symptoms were as following.  The console is blocked, and if I type
something, I don't see it unless I enter into the DDB -- then what I
have typed is displayed.

> > The resume session (with apm(4)) now looks like this:
> >
> > : cbb0: PCI Memory allocated: 50103000
> > : cbb1: PCI Memory allocated: 50102000
> > : pcm0: detached
> > : csa: card is Thinkpad 600X/A20/T20
> > : pcm0: <CS461x PCM Audio> on csa0
> > : pcm0: <Cirrus Logic CS4297A ac97 codec>
> > : wakeup from sleeping state (slept 00:00:10)
> > : ata0: resetting devices ..
> > : done
> > : ata1: resetting devices ..
> > : ata1-slave: timeout waiting for cmd=ec s=01 e=24
> > : ata1-slave: ATA identify failed
> > : done
> 
> Apparently the timeout is too short or the interrupt got lost.  The
> timeout seems to be too short.  It is 10 seconds, but IIRC the spec
> is says 30 seconds for reset of the master and a bit more for the
> slave.  Since things work with polling, we know that the device state
> changed properly.  We could test for this state change instead of
> always aborting after the timeout, and do finer grained and more sleeps
> to determine the precise timeout required.
> 
I recall seeing the ``stray irq 15'' too, so yes, that may likely
be the case here.  I will try bumping up the ATA_WAIT_INTR timeout
later today and let you know the results.


Cheers,
-- 
Ruslan Ermilov          Sysadmin and DBA,
[EMAIL PROTECTED]           Sunbay Software AG,
[EMAIL PROTECTED]          FreeBSD committer,
+380.652.512.251        Simferopol, Ukraine

http://www.FreeBSD.org  The Power To Serve
http://www.oracle.com   Enabling The Information Age

Attachment: msg52695/pgp00000.pgp
Description: PGP signature

Reply via email to