Hi Sage,
On Fri, Oct 29, 2010 at 1:08 PM, Sage Weil <[email protected]> wrote:
> On Fri, 29 Oct 2010, Henry C Chang wrote:
>> Hi,
>>
>> getattr on mds hanged again.
>>
>> I have already reverted d91f2438d881514e4a923fd786dbd94b764a9440.
>> Although the probability is significant lowered down, it still has the
>> chance to hang on getattr.
>>
>> Attached are the logs of mds and the hanging client. :(
>>
>> I'm using ceph-client-standalone master-backport branch on 2.6.32 kernel.
>
> It looks like ceph_check_caps is hung somehow:
>
> ceph:ceph: handle_caps from mds0
> ceph:ceph: mds0 seq 99 cap seq 28
> ceph:ceph: op revoke ino 10000000bd1.fffffffffffffffe inode
> ffff8800a6251d88
> ceph:ceph: handle_cap_grant inode ffff8800a6251d88 cap ffff8800a635b780
> mds0 seq 28 pAsLsXsFr
> ceph:ceph: size 4294967296 max_size 8594128896, i_size 4294967296
> ceph:ceph: try_nonblocking_invalidate ffff8800a6251d88 success
> ceph:ceph: __ceph_caps_issued ffff8800a6251d88 cap ffff8800a635b780
> issued pAsLsXsFscr
> ceph:ceph: __ceph_caps_issued ffff8800a6251d88 cap ffff8800a635b780
> issued pAsLsXsFscr
> ceph:ceph: ffff8800a6251d88 mode 0100644 uid.gid 0.0
> ceph:ceph: my wanted = pAsxXsxFsxcrwb, used = pFcr, dirty -
> ceph:ceph: revocation: pAsLsXsFscr -> pAsLsXsFr (revoking Fsc)
> ceph:ceph: __ceph_caps_issued ffff8800a6251d88 cap ffff8800a635b780
> issued pAsLsXsFr
> ceph:ceph: check_caps ffff8800a6251d88 file_want pAsxXsxFsxcrwb used pFcr
> dirty - flushing - issued pAsLsXsFr revoking Fsc retain pAsxLsxXsxFsxcrwbl
> AUTHONLY NODELAY
> ceph:ceph: mds0 revoking Fsc
> ceph:ceph: mdsc put_session ffff8800b41c6000 3 -> 2
> ceph:ceph: mdsc con_put ffff8800b41c6000 (2)
> ceph:ceph: aio_read ffff8800a6251d88 10000000bd1.fffffffffffffffe
> dropping cap refs on Fcr = 512
> ceph:ceph: put_cap_refs ffff8800a6251d88 had Fcr last
> ceph:ceph: __ceph_caps_issued ffff8800a6251d88 cap ffff8800a635b780
> issued pAsLsXsFr
> ceph:ceph: check_caps ffff8800a6251d88 file_want pAsxXsxFsxcrwb used pFc
> dirty - flushing - issued pAsLsXsFr revoking Fsc retain pAsxLsxXsxFsxcrwbl
> ceph:ceph: check_caps trying to invalidate on ffff8800a6251d88
> ceph:ceph: try_nonblocking_invalidate ffff8800a6251d88 failed
> ceph:ceph: check_caps queuing invalidate
>
> --> this means queue_invalidate = 1, and check_caps will call
> ceph_queue_invalidate on exit, which will always print something...
>
> ceph:ceph: __ceph_caps_issued ffff8800a6251d88 cap ffff8800a635b780 issued
> pAsLsXsFr
> ceph:ceph: check_caps ffff8800a6251d88 file_want pAsxXsxFsxcrwb used pFc
> dirty - flushing - issued pAsLsXsFr revoking Fsc retain pAsxLsxXsxFsxcrwbl
> ceph:ceph: mds0 revoking Fsc
> ceph:ceph: __cap_delay_cancel ffff8800a6251d88
>
> ...but that never happens. Probably the CPU got blocked somewhere? Can
> you see what the system is doing at this point? sysrq-t, or check the
> process list for ceph-msgr and cat it's stack (/proc/$pid/stack)? The
> task should be blocked in ceph_check_caps() somewhere...
it didn't go to ceph_queue_invalidate because delayed and is_delayed = 0.
if (delayed && is_delayed)
force_requeue = 1; /* __send_cap delayed release; requeue */
if (!delayed && !is_delayed)
__cap_delay_cancel(mdsc, ci);
else if (!is_delayed || force_requeue)
__cap_delay_requeue(mdsc, ci);
spin_unlock(&inode->i_lock);
if (queue_invalidate)
ceph_queue_invalidate(inode);
so it goes to __cap_delay_cancel().
Should I said, the correct behavior is to go to ceph_queue_invalidate(inode)?
>
> (BTW, if you're building your own kernel, one thing that I've found
> helpful is enabling the CONFIG_PRINTK_TIME option in .config, and updating
> kernel/printk.c to also include current->pid in the line prefix. That
> helps sort out what tasks are doing what when. But if you're stuck on
> 2.6.32 for some reason that probably not the case!)
>
> Thanks!
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html