Re: [ceph-users] cephfs kernel client hangs

Zhenshi Zhou Wed, 08 Aug 2018 07:47:39 -0700

Hi,
Is there any other way excpet rebooting the server when the client hangs?
If the server is in production environment, I can't restart it everytime.


Webert de Souza Lima <webert.b...@gmail.com> 于2018年8月8日周三 下午10:33写道：

> Hi Zhenshi,
>
> if you still have the client mount hanging but no session is connected,
> you probably have some PID waiting with blocked IO from cephfs mount.
> I face that now and then and the only solution is to reboot the server, as
> you won't be able to kill a process with pending IO.
>
> Regards,
>
> Webert Lima
> DevOps Engineer at MAV Tecnologia
> *Belo Horizonte - Brasil*
> *IRC NICK - WebertRLZ*
>
>
> On Wed, Aug 8, 2018 at 11:17 AM Zhenshi Zhou <deader...@gmail.com> wrote:
>
>> Hi Webert,
>> That command shows the current sessions, whereas the server which I get
>> the files(osdc,mdsc,monc) disconnect for a long time.
>> So I cannot get useful infomation from the command you provide.
>>
>> Thanks
>>
>> Webert de Souza Lima <webert.b...@gmail.com> 于2018年8月8日周三 下午10:10写道：
>>
>>> You could also see open sessions at the MDS server by issuing  `ceph
>>> daemon mds.XX session ls`
>>>
>>> Regards,
>>>
>>> Webert Lima
>>> DevOps Engineer at MAV Tecnologia
>>> *Belo Horizonte - Brasil*
>>> *IRC NICK - WebertRLZ*
>>>
>>>
>>> On Wed, Aug 8, 2018 at 5:08 AM Zhenshi Zhou <deader...@gmail.com> wrote:
>>>
>>>> Hi, I find an old server which mounted cephfs and has the debug files.
>>>> # cat osdc
>>>> REQUESTS 0 homeless 0
>>>> LINGER REQUESTS
>>>> BACKOFFS
>>>> # cat monc
>>>> have monmap 2 want 3+
>>>> have osdmap 3507
>>>> have fsmap.user 0
>>>> have mdsmap 55 want 56+
>>>> fs_cluster_id -1
>>>> # cat mdsc
>>>> 194     mds0    getattr  #10000036ae3
>>>>
>>>> What does it mean?
>>>>
>>>> Zhenshi Zhou <deader...@gmail.com> 于2018年8月8日周三 下午1:58写道：
>>>>
>>>>> I restarted the client server so that there's no file in that
>>>>> directory. I will take care of it if the client hangs next time.
>>>>>
>>>>> Thanks
>>>>>
>>>>> Yan, Zheng <uker...@gmail.com> 于2018年8月8日周三 上午11:23写道：
>>>>>
>>>>>> On Wed, Aug 8, 2018 at 11:02 AM Zhenshi Zhou <deader...@gmail.com>
>>>>>> wrote:
>>>>>> >
>>>>>> > Hi,
>>>>>> > I check all my ceph servers and they are not mount cephfs on each
>>>>>> of them(maybe I umount after testing). As a result, the cluster didn't
>>>>>> encounter a memory deadlock. Besides, I check the monitoring system and 
>>>>>> the
>>>>>> memory and cpu usage were at common level while the clients hung.
>>>>>> > Back to my question, there must be something else cause the client
>>>>>> hang.
>>>>>> >
>>>>>>
>>>>>> Check if there are hang requests in
>>>>>> /sys/kernel/debug/ceph/xxxx/{osdc,mdsc},
>>>>>>
>>>>>> > Zhenshi Zhou <deader...@gmail.com> 于2018年8月8日周三 上午4:16写道：
>>>>>> >>
>>>>>> >> Hi, I'm not sure if it just mounts the cephfs without using or
>>>>>> doing any operation within the mounted directory would be affected by
>>>>>> flushing cache. I mounted cephfs on osd servers only for testing and then
>>>>>> left it there. Anyway I will umount it.
>>>>>> >>
>>>>>> >> Thanks
>>>>>> >>
>>>>>> >> John Spray <jsp...@redhat.com>于2018年8月8日 周三03:37写道：
>>>>>> >>>
>>>>>> >>> On Tue, Aug 7, 2018 at 5:42 PM Reed Dier <reed.d...@focusvq.com>
>>>>>> wrote:
>>>>>> >>> >
>>>>>> >>> > This is the first I am hearing about this as well.
>>>>>> >>>
>>>>>> >>> This is not a Ceph-specific thing -- it can also affect similar
>>>>>> >>> systems like Lustre.
>>>>>> >>>
>>>>>> >>> The classic case is when under some memory pressure, the kernel
>>>>>> tries
>>>>>> >>> to free memory by flushing the client's page cache, but doing the
>>>>>> >>> flush means allocating more memory on the server, making the
>>>>>> memory
>>>>>> >>> pressure worse, until the whole thing just seizes up.
>>>>>> >>>
>>>>>> >>> John
>>>>>> >>>
>>>>>> >>> > Granted, I am using ceph-fuse rather than the kernel client at
>>>>>> this point, but that isn’t etched in stone.
>>>>>> >>> >
>>>>>> >>> > Curious if there is more to share.
>>>>>> >>> >
>>>>>> >>> > Reed
>>>>>> >>> >
>>>>>> >>> > On Aug 7, 2018, at 9:47 AM, Webert de Souza Lima <
>>>>>> webert.b...@gmail.com> wrote:
>>>>>> >>> >
>>>>>> >>> >
>>>>>> >>> > Yan, Zheng <uker...@gmail.com> 于2018年8月7日周二 下午7:51写道：
>>>>>> >>> >>
>>>>>> >>> >> On Tue, Aug 7, 2018 at 7:15 PM Zhenshi Zhou <
>>>>>> deader...@gmail.com> wrote:
>>>>>> >>> >> this can cause memory deadlock. you should avoid doing this
>>>>>> >>> >>
>>>>>> >>> >> > Yan, Zheng <uker...@gmail.com>于2018年8月7日 周二19:12写道：
>>>>>> >>> >> >>
>>>>>> >>> >> >> did you mount cephfs on the same machines that run ceph-osd?
>>>>>> >>> >> >>
>>>>>> >>> >
>>>>>> >>> >
>>>>>> >>> > I didn't know about this. I run this setup in production. :P
>>>>>> >>> >
>>>>>> >>> > Regards,
>>>>>> >>> >
>>>>>> >>> > Webert Lima
>>>>>> >>> > DevOps Engineer at MAV Tecnologia
>>>>>> >>> > Belo Horizonte - Brasil
>>>>>> >>> > IRC NICK - WebertRLZ
>>>>>> >>> >
>>>>>> >>> > _______________________________________________
>>>>>> >>> > ceph-users mailing list
>>>>>> >>> > ceph-users@lists.ceph.com
>>>>>> >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>> >>> >
>>>>>> >>> >
>>>>>> >>> > _______________________________________________
>>>>>> >>> > ceph-users mailing list
>>>>>> >>> > ceph-users@lists.ceph.com
>>>>>> >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>> >>> _______________________________________________
>>>>>> >>> ceph-users mailing list
>>>>>> >>> ceph-users@lists.ceph.com
>>>>>> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>> >
>>>>>> > _______________________________________________
>>>>>> > ceph-users mailing list
>>>>>> > ceph-users@lists.ceph.com
>>>>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>
>>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] cephfs kernel client hangs

Reply via email to