On Sat, 2014-09-27 at 19:41 +0100, Mike Crowe wrote:
> I compiled my own version of the Debian 3.2.60-1+deb7u3 kernel with
> CONFIG_LOCKDEP and panic on hung task enabled.
> 
> >From the crash dump:
> 
> [25202.156175] INFO: task nfsd:3247 blocked for more than 900 seconds.
> [25202.162565] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [25202.170432] nfsd            D ffff88080aa0eca8     0  3247      2 
> 0x00000000
> [25202.170444]  ffff88080a8e19f0 0000000000000046 0000000000000006 
> ffff880800000000
> [25202.170458]  ffff88080aa0e9c0 ffff88080a8e1fd8 ffff88080a8e1fd8 
> 00000000001d4040
> [25202.170472]  ffff88040e9926c0 ffff88080aa0e9c0 ffffffff8138d6da 
> 00000001a04c47dd
> [25202.170488] Call Trace:
> [25202.170504]  [<ffffffff8138d6da>] ? __mutex_lock_common+0x236/0x379
> [25202.170531]  [<ffffffffa04c47dd>] ? fh_lock_nested+0x4d/0x61 [nfsd]
> [25202.170542]  [<ffffffff8138cda2>] schedule+0x55/0x57
> [25202.170552]  [<ffffffff8138d6e7>] __mutex_lock_common+0x243/0x379
> [25202.170569]  [<ffffffffa04c47dd>] ? fh_lock_nested+0x4d/0x61 [nfsd]
> [25202.170581]  [<ffffffff8138d8dc>] mutex_lock_nested+0x2a/0x31
> [25202.170598]  [<ffffffffa04c47dd>] fh_lock_nested+0x4d/0x61 [nfsd]
> [25202.170610]  [<ffffffff810140f5>] ? sched_clock+0x9/0xd
> [25202.170626]  [<ffffffffa04c50fe>] nfsd_lookup_dentry+0x196/0x227 [nfsd]
> [25202.170646]  [<ffffffffa04cef7f>] nfsd4_secinfo.part.15+0x26/0x9e [nfsd]
> [25202.170666]  [<ffffffffa04cf044>] nfsd4_secinfo+0x4d/0x5b [nfsd]
> [25202.170688]  [<ffffffffa04ce105>] nfsd4_proc_compound+0x265/0x43e [nfsd]
> [25202.170703]  [<ffffffffa04c181d>] nfsd_dispatch+0xe2/0x1c8 [nfsd]
> [25202.170734]  [<ffffffffa03759c1>] svc_process_common+0x2cf/0x4d0 [sunrpc]
> [25202.170759]  [<ffffffffa0375de0>] svc_process+0x118/0x136 [sunrpc]
> [25202.170773]  [<ffffffffa04c10eb>] nfsd+0xeb/0x131 [nfsd]
> [25202.170796]  [<ffffffffa04c1000>] ? 0xffffffffa04c0fff
> [25202.170806]  [<ffffffff81065c75>] kthread+0xa3/0xab
> [25202.170815]  [<ffffffff81396584>] kernel_thread_helper+0x4/0x10
> [25202.170823]  [<ffffffff8138f074>] ? retint_restore_args+0x13/0x13
> [25202.170830]  [<ffffffff81065bd2>] ? __init_kthread_worker+0x53/0x53
> [25202.170837]  [<ffffffff81396580>] ? gs_change+0x13/0x13
> [25202.170842] 1 lock held by nfsd/3247:
> [25202.170845]  #0:  (&sb->s_type->i_mutex_key#13){+.+.+.}, at: 
> [<ffffffffa04c47dd>] fh_lock_nested+0x4d/0x61 [nfsd]
> [25202.170870] Kernel panic - not syncing: hung_task: blocked tasks
> 
> I'm no expert at interpreting lockdep output but I think this is saying
> that nfsd is taking a nested lock and then deadlocks trying to take it
> again (which presumably shouldn't happen because it is nested.)

nfsd is trying to lock two objects in the same class: specifically, it
locks a file handle and then the file handle for the file's parent.
It's generally safe to do this so long as they're always taken in that
order.  lockdep should complain (much more verbosely) if this is not
done consistently.

I'm afraid this doesn't explain what's going wrong.  But if there are
any more messages from lockdep further up the log (like, 15 minutes
earlier), they might do.

Ben.

-- 
Ben Hutchings
This sentence contradicts itself - no actually it doesn't.

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to