qpc is the just the caller of the last successfull *acquired* qlock.
what we know is that the exportfs proc spins in the q->use taslock
called by qlock() right?  this already seems wired...  q->use is held
just long enougth to test q->locked and manipulate the queue.  also
sched() will avoid switching to another proc while we are holding tas
locks.

i would like to know which qlock is the kernel is trying to acquire
on behalf of exportfs that is also reachable from the etherread4
code.

one could move:

        up->qpc = getcallerpc(&q);

from qlock() before the lock(&q->use); so we can see from where that
qlock gets called that hangs the exportfs call, or add another magic
debug pointer (qpctry) to the proc stucture and print it in dumpaproc().

--
cinap
--- Begin Message ---
> > acid: src(0xf0148c8a)
> > /sys/src/9/ip/tcp.c:2096
> >  2091               if(waserror()){
> >  2092                       qunlock(s);
> >  2093                       nexterror();
> >  2094               }
> >  2095               qlock(s);
> >>2096                qunlock(tcp);
> >  2097       
> >  2098               /* fix up window */
> >  2099               seg.wnd <<= tcb->rcv.scale;
> >  2100       
> >  2101               /* every input packet in puts off the keep alive time 
> > out */
> 
> The source actually says (to be pedantic):
> 
>       /* The rest of the input state machine is run with the control block
>        * locked and implements the state machine directly out of the RFC.
>        * Out-of-band data is ignored - it was always a bad idea.
>        */
>       tcb = (Tcpctl*)s->ptcl;
>       if(waserror()){
>               qunlock(s);
>               nexterror();
>       }
>       qlock(s);
>       qunlock(tcp);
> 
> Now, the qunlock(s) should not precede the qlock(s), this is the first
> case in this procedure:

it doesn't.  waserror() can't be executed before the code
following it.  perhpas it could be more carefully written
as

> >  2095               qlock(s);
> >  2091               if(waserror()){
> >  2092                       qunlock(s);
> >  2093                       nexterror();
> >  2094               }
> >>2096                qunlock(tcp);

but it really wouldn't make any difference.

i'm not completely convinced that tcp's to blame.
and if it is, i think the problem is probablly tcp
timers.

- erik

--- End Message ---

Reply via email to