Re: [ceph-users] cephfs kernel client hangs

Zhenshi Zhou Mon, 27 Aug 2018 06:11:09 -0700

Hi,
The kernel version is 4.12.8-1.el7.elrepo.x86_64.
Client.267792 has gone as I restart the server at weekend.
Does ceph-fuse more stable than kernel client?


Yan, Zheng <uker...@gmail.com> 于2018年8月27日周一 上午11:41写道：

> please check client.213528, instead of client.267792. which version of
> kernel client.213528 use.
> On Sat, Aug 25, 2018 at 6:12 AM Zhenshi Zhou <deader...@gmail.com> wrote:
> >
> > Hi,
> > This time,  osdc:
> >
> > REQUESTS 0 homeless 0
> > LINGER REQUESTS
> >
> > monc:
> >
> > have monmap 2 want 3+
> > have osdmap 4545 want 4546
> > have fsmap.user 0
> > have mdsmap 446 want 447+
> > fs_cluster_id -1
> >
> > mdsc:
> >
> > 649065  mds0    setattr  #100002e7e5a
> >
> > Anything useful?
> >
> >
> >
> > Yan, Zheng <uker...@gmail.com> 于2018年8月25日周六 上午7:53写道：
> >>
> >> Are there hang request in /sys/kernel/debug/ceph/xxxx/osdc
> >>
> >> On Fri, Aug 24, 2018 at 9:32 PM Zhenshi Zhou <deader...@gmail.com>
> wrote:
> >> >
> >> > I'm afaid that the client hangs again...the log shows:
> >> >
> >> > 2018-08-24 21:27:54.714334 [WRN]  slow request 62.607608 seconds old,
> received at 2018-08-24 21:26:52.106633: client_request(client.213528:241811
> getattr pAsLsXsFs #0x100002e7e5a 2018-08-24 21:26:52.106425 caller_uid=0,
> caller_gid=0{}) currently failed to rdlock, waiting
> >> > 2018-08-24 21:27:54.714320 [WRN]  3 slow requests, 1 included below;
> oldest blocked for > 843.556758 secs
> >> > 2018-08-24 21:27:24.713740 [WRN]  slow request 32.606979 seconds old,
> received at 2018-08-24 21:26:52.106633: client_request(client.213528:241811
> getattr pAsLsXsFs #0x100002e7e5a 2018-08-24 21:26:52.106425 caller_uid=0,
> caller_gid=0{}) currently failed to rdlock, waiting
> >> > 2018-08-24 21:27:24.713729 [WRN]  3 slow requests, 1 included below;
> oldest blocked for > 813.556129 secs
> >> > 2018-08-24 21:25:49.711778 [WRN]  slow request 483.807963 seconds
> old, received at 2018-08-24 21:17:45.903726:
> client_request(client.213528:241810 getattr pAsLsXsFs #0x100002e7e5a
> 2018-08-24 21:17:45.903049 caller_uid=0, caller_gid=0{}) currently failed
> to rdlock, waiting
> >> > 2018-08-24 21:25:49.711766 [WRN]  2 slow requests, 1 included below;
> oldest blocked for > 718.554206 secs
> >> > 2018-08-24 21:21:54.707536 [WRN]  client.213528 isn't responding to
> mclientcaps(revoke), ino 0x100002e7e5a pending pAsLsXsFr issued
> pAsLsXsFscr, sent 483.548912 seconds ago
> >> > 2018-08-24 21:21:54.706930 [WRN]  slow request 483.549363 seconds
> old, received at 2018-08-24 21:13:51.157483:
> client_request(client.267792:649065 setattr size=0 mtime=2018-08-24
> 21:13:51.163236 #0x100002e7e5a 2018-08-24 21:13:51.163236 caller_uid=0,
> caller_gid=0{}) currently failed to xlock, waiting
> >> > 2018-08-24 21:21:54.706920 [WRN]  2 slow requests, 1 included below;
> oldest blocked for > 483.549363 secs
> >> > 2018-08-24 21:21:49.706838 [WRN]  slow request 243.803027 seconds
> old, received at 2018-08-24 21:17:45.903726:
> client_request(client.213528:241810 getattr pAsLsXsFs #0x100002e7e5a
> 2018-08-24 21:17:45.903049 caller_uid=0, caller_gid=0{}) currently failed
> to rdlock, waiting
> >> > 2018-08-24 21:21:49.706828 [WRN]  2 slow requests, 1 included below;
> oldest blocked for > 478.549269 secs
> >> > 2018-08-24 21:19:49.704294 [WRN]  slow request 123.800486 seconds
> old, received at 2018-08-24 21:17:45.903726:
> client_request(client.213528:241810 getattr pAsLsXsFs #0x100002e7e5a
> 2018-08-24 21:17:45.903049 caller_uid=0, caller_gid=0{}) currently failed
> to rdlock, waiting
> >> > 2018-08-24 21:19:49.704284 [WRN]  2 slow requests, 1 included below;
> oldest blocked for > 358.546729 secs
> >> > 2018-08-24 21:18:49.703073 [WRN]  slow request 63.799269 seconds old,
> received at 2018-08-24 21:17:45.903726: client_request(client.213528:241810
> getattr pAsLsXsFs #0x100002e7e5a 2018-08-24 21:17:45.903049 caller_uid=0,
> caller_gid=0{}) currently failed to rdlock, waiting
> >> > 2018-08-24 21:18:49.703062 [WRN]  2 slow requests, 1 included below;
> oldest blocked for > 298.545511 secs
> >> > 2018-08-24 21:18:19.702465 [WRN]  slow request 33.798637 seconds old,
> received at 2018-08-24 21:17:45.903726: client_request(client.213528:241810
> getattr pAsLsXsFs #0x100002e7e5a 2018-08-24 21:17:45.903049 caller_uid=0,
> caller_gid=0{}) currently failed to rdlock, waiting
> >> > 2018-08-24 21:18:19.702456 [WRN]  2 slow requests, 1 included below;
> oldest blocked for > 268.544880 secs
> >> > 2018-08-24 21:17:54.702517 [WRN]  client.213528 isn't responding to
> mclientcaps(revoke), ino 0x100002e7e5a pending pAsLsXsFr issued
> pAsLsXsFscr, sent 243.543893 seconds ago
> >> > 2018-08-24 21:17:54.701904 [WRN]  slow request 243.544331 seconds
> old, received at 2018-08-24 21:13:51.157483:
> client_request(client.267792:649065 setattr size=0 mtime=2018-08-24
> 21:13:51.163236 #0x100002e7e5a 2018-08-24 21:13:51.163236 caller_uid=0,
> caller_gid=0{}) currently failed to xlock, waiting
> >> > 2018-08-24 21:17:54.701894 [WRN]  1 slow requests, 1 included below;
> oldest blocked for > 243.544331 secs
> >> > 2018-08-24 21:15:54.700034 [WRN]  client.213528 isn't responding to
> mclientcaps(revoke), ino 0x100002e7e5a pending pAsLsXsFr issued
> pAsLsXsFscr, sent 123.541410 seconds ago
> >> > 2018-08-24 21:15:54.699385 [WRN]  slow request 123.541822 seconds
> old, received at 2018-08-24 21:13:51.157483:
> client_request(client.267792:649065 setattr size=0 mtime=2018-08-24
> 21:13:51.163236 #0x100002e7e5a 2018-08-24 21:13:51.163236 caller_uid=0,
> caller_gid=0{}) currently failed to xlock, waiting
> >> > 2018-08-24 21:15:54.699375 [WRN]  1 slow requests, 1 included below;
> oldest blocked for > 123.541822 secs
> >> > 2018-08-24 21:14:57.055183 [WRN]  Health check failed: 1 clients
> failing to respond to capability release (MDS_CLIENT_LATE_RELEASE)
> >> > 2018-08-24 21:14:56.167868 [WRN]  MDS health message (mds.0): Client
> docker39 failing to respond to capability release
> >> > 2018-08-24 21:14:54.698753 [WRN]  client.213528 isn't responding to
> mclientcaps(revoke), ino 0x100002e7e5a pending pAsLsXsFr issued
> pAsLsXsFscr, sent 63.540127 seconds ago
> >> > 2018-08-24 21:14:54.698104 [WRN]  slow request 63.540533 seconds old,
> received at 2018-08-24 21:13:51.157483: client_request(client.267792:649065
> setattr size=0 mtime=2018-08-24 21:13:51.163236 #0x100002e7e5a 2018-08-24
> 21:13:51.163236 caller_uid=0, caller_gid=0{}) currently failed to xlock,
> waiting
> >> > 2018-08-24 21:14:54.698086 [WRN]  1 slow requests, 1 included below;
> oldest blocked for > 63.540533 secs
> >> > 2018-08-24 21:14:28.217536 [WRN]  Health check failed: 1 MDSs report
> slow requests (MDS_SLOW_REQUEST)
> >> > 2018-08-24 21:14:28.167096 [WRN]  MDS health message (mds.0): 1 slow
> requests are blocked > 30 sec
> >> >
> >> >
> >> >
> >> > Yan, Zheng <uker...@gmail.com> 于2018年8月14日周二 下午3:13写道：
> >> >>
> >> >> On Mon, Aug 13, 2018 at 9:55 PM Zhenshi Zhou <deader...@gmail.com>
> wrote:
> >> >> >
> >> >> > Hi Burkhard,
> >> >> > I'm sure the user has permission to read and write. Besides, we're
> not using EC data pools.
> >> >> > Now the situation is that any openration to a specific file, the
> command will hang.
> >> >> > Operations to any other files won't hang.
> >> >> >
> >> >>
> >> >> can ceph-fuse client read the specific file ?
> >> >>
> >> >> > Burkhard Linke <burkhard.li...@computational.bio.uni-giessen.de>
> 于2018年8月13日周一 下午9:42写道：
> >> >> >>
> >> >> >> Hi,
> >> >> >>
> >> >> >>
> >> >> >> On 08/13/2018 03:22 PM, Zhenshi Zhou wrote:
> >> >> >> > Hi,
> >> >> >> > Finally, I got a running server with files
> /sys/kernel/debug/ceph/xxx/
> >> >> >> >
> >> >> >> > [root@docker27
> 525c4413-7a08-40ca-9a98-0a6df009025b.client213522]# cat mdsc
> >> >> >> > [root@docker27
> 525c4413-7a08-40ca-9a98-0a6df009025b.client213522]# cat monc
> >> >> >> > have monmap 2 want 3+
> >> >> >> > have osdmap 4545 want 4546
> >> >> >> > have fsmap.user 0
> >> >> >> > have mdsmap 335 want 336+
> >> >> >> > fs_cluster_id -1
> >> >> >> > [root@docker27
> 525c4413-7a08-40ca-9a98-0a6df009025b.client213522]# cat osdc
> >> >> >> > REQUESTS 6 homeless 0
> >> >> >> > 82580   osd10   1.7f9ddac7      [10,13]/10      [10,13]/10
> >> >> >> > 10000053a04.00000000    0x400024        1       write
> >> >> >> > 81019   osd11   1.184ed679      [11,7]/11       [11,7]/11
> >> >> >> >   1000005397b.00000000    0x400024        1       write
> >> >> >> > 81012   osd12   1.cd98ed57      [12,9]/12       [12,9]/12
> >> >> >> >   10000053971.00000000    0x400024        1
>  write,startsync
> >> >> >> > 82589   osd12   1.7cd5405a      [12,8]/12       [12,8]/12
> >> >> >> >   10000053a13.00000000    0x400024        1
>  write,startsync
> >> >> >> > 80972   osd13   1.91886156      [13,4]/13       [13,4]/13
> >> >> >> >   10000053939.00000000    0x400024        1       write
> >> >> >> > 81035   osd13   1.ac5ccb56      [13,4]/13       [13,4]/13
> >> >> >> >   10000053997.00000000    0x400024        1       write
> >> >> >> >
> >> >> >> > The cluster claims nothing, and shows HEALTH_OK still.
> >> >> >> > What I did is just vim a file storing on cephfs, and then it
> hung there.
> >> >> >> > And I got a process with 'D' stat.
> >> >> >> > By the way, the whole mount directory is still in use and with
> no error.
> >> >> >>
> >> >> >> So there are no pending mds requests, mon seems to be ok, too.
> >> >> >>
> >> >> >> But the osd requests seems to be stuck. Are you sure the ceph
> user used
> >> >> >> for the mount point is allowed to write to the cephfs data pools?
> Are
> >> >> >> you using additional EC data pools?
> >> >> >>
> >> >> >> Regards,
> >> >> >> Burkhard
> >> >> >> _______________________________________________
> >> >> >> ceph-users mailing list
> >> >> >> ceph-users@lists.ceph.com
> >> >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >> >
> >> >> > _______________________________________________
> >> >> > ceph-users mailing list
> >> >> > ceph-users@lists.ceph.com
> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] cephfs kernel client hangs

Reply via email to