Hi, Is there any other way excpet rebooting the server when the client hangs? If the server is in production environment, I can't restart it everytime.
Webert de Souza Lima <webert.b...@gmail.com> 于2018年8月8日周三 下午10:33写道: > Hi Zhenshi, > > if you still have the client mount hanging but no session is connected, > you probably have some PID waiting with blocked IO from cephfs mount. > I face that now and then and the only solution is to reboot the server, as > you won't be able to kill a process with pending IO. > > Regards, > > Webert Lima > DevOps Engineer at MAV Tecnologia > *Belo Horizonte - Brasil* > *IRC NICK - WebertRLZ* > > > On Wed, Aug 8, 2018 at 11:17 AM Zhenshi Zhou <deader...@gmail.com> wrote: > >> Hi Webert, >> That command shows the current sessions, whereas the server which I get >> the files(osdc,mdsc,monc) disconnect for a long time. >> So I cannot get useful infomation from the command you provide. >> >> Thanks >> >> Webert de Souza Lima <webert.b...@gmail.com> 于2018年8月8日周三 下午10:10写道: >> >>> You could also see open sessions at the MDS server by issuing `ceph >>> daemon mds.XX session ls` >>> >>> Regards, >>> >>> Webert Lima >>> DevOps Engineer at MAV Tecnologia >>> *Belo Horizonte - Brasil* >>> *IRC NICK - WebertRLZ* >>> >>> >>> On Wed, Aug 8, 2018 at 5:08 AM Zhenshi Zhou <deader...@gmail.com> wrote: >>> >>>> Hi, I find an old server which mounted cephfs and has the debug files. >>>> # cat osdc >>>> REQUESTS 0 homeless 0 >>>> LINGER REQUESTS >>>> BACKOFFS >>>> # cat monc >>>> have monmap 2 want 3+ >>>> have osdmap 3507 >>>> have fsmap.user 0 >>>> have mdsmap 55 want 56+ >>>> fs_cluster_id -1 >>>> # cat mdsc >>>> 194 mds0 getattr #10000036ae3 >>>> >>>> What does it mean? >>>> >>>> Zhenshi Zhou <deader...@gmail.com> 于2018年8月8日周三 下午1:58写道: >>>> >>>>> I restarted the client server so that there's no file in that >>>>> directory. I will take care of it if the client hangs next time. >>>>> >>>>> Thanks >>>>> >>>>> Yan, Zheng <uker...@gmail.com> 于2018年8月8日周三 上午11:23写道: >>>>> >>>>>> On Wed, Aug 8, 2018 at 11:02 AM Zhenshi Zhou <deader...@gmail.com> >>>>>> wrote: >>>>>> > >>>>>> > Hi, >>>>>> > I check all my ceph servers and they are not mount cephfs on each >>>>>> of them(maybe I umount after testing). As a result, the cluster didn't >>>>>> encounter a memory deadlock. Besides, I check the monitoring system and >>>>>> the >>>>>> memory and cpu usage were at common level while the clients hung. >>>>>> > Back to my question, there must be something else cause the client >>>>>> hang. >>>>>> > >>>>>> >>>>>> Check if there are hang requests in >>>>>> /sys/kernel/debug/ceph/xxxx/{osdc,mdsc}, >>>>>> >>>>>> > Zhenshi Zhou <deader...@gmail.com> 于2018年8月8日周三 上午4:16写道: >>>>>> >> >>>>>> >> Hi, I'm not sure if it just mounts the cephfs without using or >>>>>> doing any operation within the mounted directory would be affected by >>>>>> flushing cache. I mounted cephfs on osd servers only for testing and then >>>>>> left it there. Anyway I will umount it. >>>>>> >> >>>>>> >> Thanks >>>>>> >> >>>>>> >> John Spray <jsp...@redhat.com>于2018年8月8日 周三03:37写道: >>>>>> >>> >>>>>> >>> On Tue, Aug 7, 2018 at 5:42 PM Reed Dier <reed.d...@focusvq.com> >>>>>> wrote: >>>>>> >>> > >>>>>> >>> > This is the first I am hearing about this as well. >>>>>> >>> >>>>>> >>> This is not a Ceph-specific thing -- it can also affect similar >>>>>> >>> systems like Lustre. >>>>>> >>> >>>>>> >>> The classic case is when under some memory pressure, the kernel >>>>>> tries >>>>>> >>> to free memory by flushing the client's page cache, but doing the >>>>>> >>> flush means allocating more memory on the server, making the >>>>>> memory >>>>>> >>> pressure worse, until the whole thing just seizes up. >>>>>> >>> >>>>>> >>> John >>>>>> >>> >>>>>> >>> > Granted, I am using ceph-fuse rather than the kernel client at >>>>>> this point, but that isn’t etched in stone. >>>>>> >>> > >>>>>> >>> > Curious if there is more to share. >>>>>> >>> > >>>>>> >>> > Reed >>>>>> >>> > >>>>>> >>> > On Aug 7, 2018, at 9:47 AM, Webert de Souza Lima < >>>>>> webert.b...@gmail.com> wrote: >>>>>> >>> > >>>>>> >>> > >>>>>> >>> > Yan, Zheng <uker...@gmail.com> 于2018年8月7日周二 下午7:51写道: >>>>>> >>> >> >>>>>> >>> >> On Tue, Aug 7, 2018 at 7:15 PM Zhenshi Zhou < >>>>>> deader...@gmail.com> wrote: >>>>>> >>> >> this can cause memory deadlock. you should avoid doing this >>>>>> >>> >> >>>>>> >>> >> > Yan, Zheng <uker...@gmail.com>于2018年8月7日 周二19:12写道: >>>>>> >>> >> >> >>>>>> >>> >> >> did you mount cephfs on the same machines that run ceph-osd? >>>>>> >>> >> >> >>>>>> >>> > >>>>>> >>> > >>>>>> >>> > I didn't know about this. I run this setup in production. :P >>>>>> >>> > >>>>>> >>> > Regards, >>>>>> >>> > >>>>>> >>> > Webert Lima >>>>>> >>> > DevOps Engineer at MAV Tecnologia >>>>>> >>> > Belo Horizonte - Brasil >>>>>> >>> > IRC NICK - WebertRLZ >>>>>> >>> > >>>>>> >>> > _______________________________________________ >>>>>> >>> > ceph-users mailing list >>>>>> >>> > ceph-users@lists.ceph.com >>>>>> >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>> >>> > >>>>>> >>> > >>>>>> >>> > _______________________________________________ >>>>>> >>> > ceph-users mailing list >>>>>> >>> > ceph-users@lists.ceph.com >>>>>> >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>> >>> _______________________________________________ >>>>>> >>> ceph-users mailing list >>>>>> >>> ceph-users@lists.ceph.com >>>>>> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>> > >>>>>> > _______________________________________________ >>>>>> > ceph-users mailing list >>>>>> > ceph-users@lists.ceph.com >>>>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>> >>>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>>
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com