Tom Tucker wrote:
> Bruce:
>
> I'll take a look...
>
> Tom
>
> On Mon, 2008-02-18 at 12:45 -0500, J. Bruce Fields wrote:
>   
>> On Sun, Feb 17, 2008 at 04:02:45PM -0500, Trond Myklebust wrote:
>>     
>>> Hi Bruce,
>>>
>>> Here is a question for you.
>>>
>>>         Why does svc_close_all() get away with deleting xprt->xpt_ready
>>>         without holding the pool->sp_lock?
>>>       
>> >From a quick look--I think the intention is that the code that calls it
>> (in svc_destroy()) is only called after all other server threads have
>> exited, and that there can't be anyone else monkeying with that service
>> any more.  But I haven't verified that really carefully.
>>     
That's certainly the intention.  The serv->sv_nrthreads fields is used
as a refcount, which counts 1 for each nfsd thread and sometimes 1 to
guard some short-term manipulations.  This refcount should not drop to
zero until the last thread exits.  So by the time svc_close_all() is
called, no thread can be looking at a pool's sp_sockets list.  Each
xprt could still be racily added to that list from softirq mode data
ready handlers calling svc_xprt_enqueue(), but only until the xprt's
xpo_detach method is called (which removes any data ready callbacks).
At that point, no code should be modifying the xpt_ready field, and
it may or may not be used to link the xprt into some pool->sp_sockets
but we don't care because all the pools are about to be destroyed
anyway.

That's the way it's been working since 2.6.19, and I don't think any
of Tom's patches changed that.

I can think of a couple of things that could be wrong:

 * serv->sv_nrthreads is used in a few places, and there might
   be bugs in that which are getting that count wrong (when I left
   the code, all increments and decrements of that field went through
   svc_get() and svc_destroy(), but other changes have crept in).

 * Currently running data ready callbacks might be racing with xpo_detach.
   Moving that call inside the spin_lock_bh() critical section just
   after it might help.



>>>> For more on this problem see 
>>>> http://marc.info/?l=linux-kernel&m=120293042005445
>>>>         
>>> There's the Bugzilla entry for it at
>>>
>>>       
>
>
> http://bugzilla.kernel.org/show_bug.cgi?id=9973
>   

It's not clear from the bugzilla that NFS is at fault here.


-- 
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
The cake is *not* a lie.
I don't speak for SGI.

-
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to