On Fri, 18 Mar 2011, huang jun wrote:
> hi,
> my ceph cluster is:
> 1 mon,1 mds, 6 osds
> use a client to write files
> but after two days,the client can not write anymore , dmesg show:
> [152312.784043] libceph:  tid 221531 timed out on osd2, will reset osd
> [152322.800025] libceph:  tid 221534 timed out on osd1, will reset osd
> [152362.864029] libceph:  tid 221553 timed out on osd3, will reset osd
> [152362.864115] libceph:  tid 221556 timed out on osd0, will reset osd
> [152362.864175] libceph:  tid 221558 timed out on osd4, will reset osd
> [152362.864236] libceph:  tid 221568 timed out on osd5, will reset osd
> [152372.880024] libceph:  tid 221531 timed out on osd2, will reset osd
> [152432.976030] libceph:  tid 221531 timed out on osd2, will reset osd
> [152493.072035] libceph:  tid 221531 timed out on osd2, will reset osd
> [152553.168039] libceph:  tid 221531 timed out on osd2, will reset osd
> [152613.264027] libceph:  tid 221531 timed out on osd2, will reset osd
> [152673.360028] libceph:  tid 221531 timed out on osd2, will reset osd
> [152733.456028] libceph:  tid 221531 timed out on osd2, will reset osd
> [152793.552026] libceph:  tid 221531 timed out on osd2, will reset osd
> [152853.648025] libceph:  tid 221531 timed out on osd2, will reset osd
> [152913.744029] libceph:  tid 221531 timed out on osd2, will reset osd
> [152973.840026] libceph:  tid 221531 timed out on osd2, will reset osd
> [153033.936026] libceph:  tid 221531 timed out on osd2, will reset osd
> 
> and on osd2:
> dmesg show :
> 
> [140056.772753] btrfs: truncated 1 orphans
> [140108.340423] btrfs: truncated 1 orphans
> [141681.918175] btrfs: truncated 1 orphans
> [148394.437973] btrfs: truncated 1 orphans
> [152007.353121] btrfs: truncated 1 orphans
> [152338.400197] btrfs: truncated 1 orphans
> [152880.944055] INFO: task btrfs-transacti:3046 blocked for more than
> 120 seconds.
> [152880.944341] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [152880.944664] btrfs-transac D ffff88007e0996c8     0  3046      2 0x00000000
> [152880.944677]  ffff88007e28d6f0 0000000000000046 0000000000000002
> 0000000000013500
> [152880.944688]  ffff88006f5b3fd8 ffff88006f5b3fd8 ffff88007e099430
> 0000000000013500
> [152880.944699]  0000000000013500 0000000000013500 ffff88007e099430
> 0000000000000286
> [152880.944710] Call Trace:
> [152880.944727]  [<ffffffff8114c646>] ? wait_for_commit+0x8f/0xd5
> [152880.944738]  [<ffffffff810536e2>] ? autoremove_wake_function+0x0/0x2e
> [152880.944748]  [<ffffffff8114d4cf>] ? btrfs_commit_transaction+0xff/0x5ec
> [152880.944759]  [<ffffffff8130ecb4>] ? schedule_timeout+0x202/0x222
> [152880.944769]  [<ffffffff810536e2>] ? autoremove_wake_function+0x0/0x2e
> [152880.944779]  [<ffffffff8114928d>] ? transaction_kthread+0x158/0x20c
> [152880.944789]  [<ffffffff81149135>] ? transaction_kthread+0x0/0x20c
> [152880.944798]  [<ffffffff81053299>] ? kthread+0x79/0x81
> [152880.944808]  [<ffffffff81003824>] ? kernel_thread_helper+0x4/0x10
> [152880.944818]  [<ffffffff81053220>] ? kthread+0x0/0x81
> [152880.944827]  [<ffffffff81003820>] ? kernel_thread_helper+0x0/0x10
> [152880.944837] INFO: task cosd:3137 blocked for more than 120 seconds.
> [152880.945157] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [152880.945513] cosd          D ffff88006f5b5038     0  3137      1 0x00000000
> [152880.945523]  ffff88007ebd34b0 0000000000000086 0000000000000000
> 0000000000013500
> [152880.945531]  ffff88007e2dffd8 ffff88007e2dffd8 ffff88006f5b4da0
> 0000000000013500
> [152880.945540]  0000000000013500 0000000000013500 ffff88006f5b4da0
> ffffffff810a5ba3
> 
> everything seems ok after we out osd2,client works fluently.
> does the problem relate to btrfs ?

Yes.  When the OSD gets hung up the client writes stall.

We added a timeout mechanism that should force the cosd daemon to fail 
when the underlying file system isn't responsive, though... which version 
are you running on the server side?

sage

Reply via email to