Hi, Ilya

The kernel version is 3.10.106.
Part of dmesg related to ceph:
[7349718.004905] libceph: osd297 down
[7349718.005190] libceph: osd299 down
[7349785.671015] libceph: osd295 down
[7350006.357509] libceph: osd291 weight 0x0 (out)
[7350006.357795] libceph: osd292 weight 0x0 (out)
[7350006.358075] libceph: osd293 weight 0x0 (out)
[7350006.358356] libceph: osd294 weight 0x0 (out)
[7350013.312399] libceph: osd289 weight 0x0 (out)
[7350013.312683] libceph: osd290 weight 0x0 (out)
[7350013.312964] libceph: osd296 weight 0x0 (out)
[7350013.313244] libceph: osd298 weight 0x0 (out)
[7350023.322571] libceph: osd288 weight 0x0 (out)
[7350038.338217] libceph: osd297 weight 0x0 (out)
[7350038.338501] libceph: osd299 weight 0x0 (out)
[7350115.364496] libceph: osd295 weight 0x0 (out)
[7350179.683200] libceph: osd294 weight 0x10000 (in)
[7350179.683495] libceph: osd294 up
[7350193.654197] libceph: osd293 weight 0x10000 (in)
[7350193.654486] libceph: osd297 weight 0x10000 (in)
[7350193.654769] libceph: osd293 up
[7350193.655046] libceph: osd297 up
[7350228.750112] libceph: osd299 weight 0x10000 (in)
[7350228.750399] libceph: osd299 up
[7350255.739415] libceph: osd289 weight 0x10000 (in)
[7350255.739700] libceph: osd289 up
[7350268.578031] libceph: osd288 weight 0x10000 (in)
[7350268.578315] libceph: osd288 up
[7383411.866068] libceph: osd299 down
[7383558.405675] libceph: osd299 up
[7383411.866068] libceph: osd299 down
[7383558.405675] libceph: osd299 up
[7387106.574308] libceph: osd291 weight 0x10000 (in)
[7387106.574593] libceph: osd291 up
[7387124.168198] libceph: osd296 weight 0x10000 (in)
[7387124.168492] libceph: osd296 up
[7387131.732934] libceph: osd292 weight 0x10000 (in)
[7387131.733218] libceph: osd292 up
[7387131.741277] libceph: osd290 weight 0x10000 (in)
[7387131.741558] libceph: osd290 up
[7387149.788781] libceph: osd298 weight 0x10000 (in)
[7387149.789066] libceph: osd298 up

A node of osds restart some days before.
And after evict session:
[7679890.147116] libceph: mds0 x.x.x.x:6800 socket closed (con state OPEN)
[7679890.491439] libceph: mds0 x.x.x.x:6800 connection reset
[7679890.491727] libceph: reset on mds0
[7679890.492006] ceph: mds0 closed our session
[7679890.492286] ceph: mds0 reconnect start
[7679910.479911] ceph: mds0 caps stale
[7679927.886621] ceph: mds0 reconnect denied

We have to restart the machine to recovery it.
I will send you an email if it happen again.

Thanks for your reply.

-----邮件原件-----
发件人: Ilya Dryomov [mailto:idryo...@gmail.com] 
发送时间: 2017年11月13日 17:30
收件人: 周 威 <cho...@msn.cn>
抄送: ceph-users@lists.ceph.com
主题: Re: 答复: [ceph-users] Where can I find the fix commit of #3370 ?

On Mon, Nov 13, 2017 at 10:18 AM, 周 威 <cho...@msn.cn> wrote:
> Hi, Ilya
>
> I'm using the kernel of centos 7, should be 3.10 I checked the patch, 
> and it is appears in my kernel source.
> We got the same stack of #3370, the process is hung in sleep_on_page_killable.
> The debugs/ceph/osdc shows there is a read request are waiting response, 
> while the command `ceph daemon osd.x ops` shows nothing.
> Evict the session from mds does not help.
> The version of ceph cluster is 10.2.9.

I don't think it's related to that ticket.

Which version of centos 7?  Can you provide dmesg?

Is it reproducible?  A debug ms = 1 log for that OSD would help with narrowing 
this down.

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to