On Thursday January 10, [EMAIL PROTECTED] wrote: > > On Fri, 2008-01-11 at 08:51 +1100, Neil Brown wrote: > > > > Is this a credible scenario? > > Yes, but I have a scenario that I think trumps it: > > * the call that puts the open context is being made in > nfs_commit_done (or possibly nfs_writeback_done), causing it to > wait until the rpc_killall_tasks completes. > * The problem is that rpc_killall_tasks won't complete until the > rpc_task that is stuck in nfs_commit_done/nfs_writeback_done > exits. > > Urgh...
Oh, yes, that would be ugly! > > I'm surprised that we can get into this state, though. How is > sys_umount() able to exit with either readaheads or writebacks still > pending? Is this perhaps occurring on a lazy umount? It does sound like it has to be a lazy unmount, doesn't it? I don't think autofs will do that. The computer in question was suspended while on the slow slow network and when it resumed the problem hit. So maybe there is a suspend-time script which did the lazy unmount. I'll ask. Meanwhile, I managed to reproduce it. It went something like: Client: mount -o soft,intr server:/home /mnt { echo open > /dev/tty ;sleep 10; echo writing > /dev/tty; \ exec cat /boot/bzImage-test; } > /mnt/testing "open" Server: rpc.nfsd 0 Client: "writing" type control-C umount -l /mnt wait a little while, notice that one rpciod is in 'D' wait. echo t > /proc/sysrq-trigger See the same stack trace. However I cannot do it again, so I cannot test a fix. The only fix that occurs to me is to use schedule_work to shunt the rpc_shutdown_client into a separate thread. Thanks, NeilBrown - To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html