On Mon, Jan 28, 2008 at 08:12:10PM -0500, Jeff Layton wrote:
> On Mon, 28 Jan 2008 19:05:17 -0500
> "J. Bruce Fields" <[EMAIL PROTECTED]> wrote:
>
> > On Mon, Jan 28, 2008 at 02:29:11PM -0500, Jeff Layton wrote:
> > > It's possible to end up continually looping in
> > > nlmsvc_retry_blocked() even when the lockd kthread has been
> > > requested to come down. I've been able to reproduce this fairly
> > > easily by having an RPC grant callback queued up to an RPC client
> > > that's not bound. If the other host is unresponsive, then the block
> > > never gets taken off the list, and lockd continually respawns new
> > > RPC's.
> > >
> > > If lockd is going down anyway, we don't want to keep retrying the
> > > blocks, so allow it to break out of this loop. Additionally, in that
> > > situation kthread_stop will have already woken lockd, so when
> > > nlmsvc_retry_blocked returns, have lockd check for kthread_stop() so
> > > that it doesn't end up calling svc_recv and waiting for a long time.
> >
> > Stupid question: how is this different from the kthread_stop() coming
> > just after this kthread_should_stop() call but before we call
> > svc_recv()? What keeps us from waiting in svc_recv() indefinitely
> > after kthread_stop()?
> >
>
> Nothing at all, unfortunately :-(
>
> Since we've taken signalling out of the lockd_down codepath, this gets
> a lot trickier than it used to be. I'm starting to wonder if we should
> go back to sending a signal on lockd_down.
OK, thanks. So do I keep these patches, or not? This sounds like a
regression (if perhaps a very minor one--I'm not quite clear). Help!
--b.
>
> > > This patch prevents this continual looping and allows lockd to
> > > eventually come down, but this can quite some time since lockd
> > > won't check the condition in the nlmsvc_retry_blocked while loop
> > > until nlmsvc_grant_blocked returns. That won't happen until
> > > all of the portmap requests time out.
> >
> > So, shutdown problems aside, does that mean an unresponsive client can
> > cause lock to stop serving lock requests in normal operation?
> >
>
> Possibly.
>
> I think that in general what happens is that blocks eventually get
> reinserted into the list with an expire time that is well after the
> current jiffies and we eventually hit this condition:
>
> if (time_after(block->b_when,jiffies)) {
> timeout = block->b_when - jiffies;
> break;
> }
>
> ...but I was seeing lockd loop in this loop over several iterations
> without returning back up to the main lockd loop. I think what was
> happening was that we were inserting the block with a later jiffies in
> nlmsvc_grant_blocked, but the rpcbind attempts would end up taking
> longer than that timeout. So when we returned from
> nlmsvc_grant_blocked() that condition would no longer fire, and we'd end up
> retrying the grant callback again.
>
> As to why we don't see this more often, I'm not sure. I probably need
> to experiment a bit more with it to make sure I'm understanding this
> correctly.
>
> Let me do that and get back to you...
-
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html