On Thu, Nov 18, 2010 at 10:20:33AM +0100, [email protected] wrote:
>
> hm... thinking about it... does the kernel assume (maybe in early
> initialization) that calling qlock() without a proc is ok as long as
> it can make sure it will not be held by another proc?
>
That's a question for Bell Labs, I suppose, but that's precisely what I
believe. There is no other explanation for the panic. Moving the
up == 0 test earlier will invalidate this assumption and cause the panic
we have already seen.
The issue here is whether there is a situation where qlock() is
intentionally invoked where up == 0 (suggested by the positioning of the
up == 0 test _after_ setting the "locked" condition). This is improbable,
though, and needs sorting out: whereas setting the lock can be done with
up == 0 - and we can also clear the lock - we cannot _fail_ to set the
lock, because then the absence of up will trigger a panic.
Now, we know that qlock() is called with up == 0, we have seen a panic
being generated by such a call. Will it suffice to locate the invocation
and somehow deal with it, or should we make qlock() more robust and cause
it to reject a request from a space where up == 0? Definitely, if qlock()
no longer allows invocations with up == 0 there will be simplifications
in its implementation. For example, the line
if(up != nil && up->nlocks.ref)
print("qlock: %#p: nlocks %lud\n", getcallerpc(&q),
up->nlocks.ref);
will no longer need the up != nil test.
But I'm convinced there's more here than meets the eye. Unfortunately,
while I have a Plan 9 distribution at my fingertips, I'm not going to
try to fix this problem in a 9vx environment, I'll wait until I get home
to deal with the native stuff. But one can speculate...
++L