On Thursday January 10, [EMAIL PROTECTED] wrote:
> 
> On Fri, 2008-01-11 at 08:51 +1100, Neil Brown wrote:
> > 
> > Is this a credible scenario?
> 
> Yes, but I have a scenario that I think trumps it:
> 
>       * the call that puts the open context is being made in
>         nfs_commit_done (or possibly nfs_writeback_done), causing it to
>         wait until the rpc_killall_tasks completes.
>       * The problem is that rpc_killall_tasks won't complete until the
>         rpc_task that is stuck in nfs_commit_done/nfs_writeback_done
>         exits.
> 
> Urgh...


Oh, yes, that would be ugly!

> 
> I'm surprised that we can get into this state, though. How is
> sys_umount() able to exit with either readaheads or writebacks still
> pending? Is this perhaps occurring on a lazy umount?

It does sound like it has to be a lazy unmount, doesn't it?  I don't
think autofs will do that.  The computer in question was suspended
while on the slow slow network and when it resumed the problem hit.
So maybe there is a suspend-time script which did the lazy unmount.
I'll ask.

Meanwhile, I managed to reproduce it.  It went something like:

Client:

  mount -o soft,intr server:/home /mnt
  { echo open > /dev/tty ;sleep 10; echo writing > /dev/tty; \
    exec cat /boot/bzImage-test;  } > /mnt/testing

  "open"

Server:
   rpc.nfsd 0

Client:

   "writing"

   type control-C

   umount -l /mnt


  wait a little while, notice that one rpciod is in 'D' wait.  
  echo t > /proc/sysrq-trigger

  See the same stack trace.


However I cannot do it again, so I cannot test a fix.


The only fix that occurs to me is to use schedule_work to shunt the
rpc_shutdown_client into a separate thread.

Thanks,
NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to