On Mon, Mar 13, 2017 at 8:15 PM, Andras Pataki
<[email protected]> wrote:
> Dear Cephers,
>
> We're using the ceph file system with the fuse client, and lately some of
> our processes are getting stuck seemingly waiting for fuse operations. At
> the same time, the cluster is healthy, no slow requests, all OSDs up and
> running, and both the MDS and the fuse client think that there are no
> pending operations. The situation is semi-reproducible. When I run a
> various cluster jobs, some get stuck after a few hours of correct operation.
> The cluster is on ceph 10.2.5 and 10.2.6, the fuse clients are 10.2.6, but I
> have tried 10.2.5 and 10.2.3, all of which have the same issue. This is on
> CentOS (7.2 for the clients, 7.3 for the MDS/OSDs).
>
> Here are some details:
>
> The node with the stuck processes:
>
> [root@worker1070 ~]# ps -auxwww | grep 30519
> apataki 30519 39.8 0.9 8728064 5257588 ? Dl 12:11 60:50 ./Arepo
> param.txt 2 6
> [root@worker1070 ~]# cat /proc/30519/stack
> [<ffffffffa0a1d7bb>] fuse_file_aio_write+0xbb/0x340 [fuse]
> [<ffffffff811ddd3d>] do_sync_write+0x8d/0xd0
> [<ffffffff811de55d>] vfs_write+0xbd/0x1e0
> [<ffffffff811defff>] SyS_write+0x7f/0xe0
> [<ffffffff816458c9>] system_call_fastpath+0x16/0x1b
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> [root@worker1070 ~]# ps -auxwww | grep 30533
> apataki 30533 39.8 0.9 8795316 5261308 ? Sl 12:11 60:55 ./Arepo
> param.txt 2 6
> [root@worker1070 ~]# cat /proc/30533/stack
> [<ffffffffa0a12241>] wait_answer_interruptible+0x91/0xe0 [fuse]
> [<ffffffffa0a12653>] __fuse_request_send+0x253/0x2c0 [fuse]
> [<ffffffffa0a126d2>] fuse_request_send+0x12/0x20 [fuse]
> [<ffffffffa0a1b966>] fuse_send_write+0xd6/0x110 [fuse]
> [<ffffffffa0a1d45d>] fuse_perform_write+0x2ed/0x590 [fuse]
> [<ffffffffa0a1d9a1>] fuse_file_aio_write+0x2a1/0x340 [fuse]
> [<ffffffff811ddd3d>] do_sync_write+0x8d/0xd0
> [<ffffffff811de55d>] vfs_write+0xbd/0x1e0
> [<ffffffff811defff>] SyS_write+0x7f/0xe0
> [<ffffffff816458c9>] system_call_fastpath+0x16/0x1b
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> Presumably the second process is waiting on the first holding some lock ...
>
> The fuse client on the node:
>
> [root@worker1070 ~]# ceph daemon /var/run/ceph/ceph-client.admin.asok status
> {
> "metadata": {
> "ceph_sha1": "656b5b63ed7c43bd014bcafd81b001959d5f089f",
> "ceph_version": "ceph version 10.2.6
> (656b5b63ed7c43bd014bcafd81b001959d5f089f)",
> "entity_id": "admin",
> "hostname": "worker1070",
> "mount_point": "\/mnt\/ceph",
> "root": "\/"
> },
> "dentry_count": 40,
> "dentry_pinned_count": 23,
> "inode_count": 123,
> "mds_epoch": 19041,
> "osd_epoch": 462327,
> "osd_epoch_barrier": 462326
> }
>
> [root@worker1070 ~]# ceph daemon /var/run/ceph/ceph-client.admin.asok
> mds_sessions
> {
> "id": 3616543,
> "sessions": [
> {
> "mds": 0,
> "addr": "10.128.128.110:6800\/909443124",
> "seq": 338,
> "cap_gen": 0,
> "cap_ttl": "2017-03-13 14:47:37.575229",
> "last_cap_renew_request": "2017-03-13 14:46:37.575229",
> "cap_renew_seq": 12694,
> "num_caps": 713,
> "state": "open"
> }
> ],
> "mdsmap_epoch": 19041
> }
>
> [root@worker1070 ~]# ceph daemon /var/run/ceph/ceph-client.admin.asok
> mds_requests
> {}
>
>
> The overall cluster health and the MDS:
>
> [root@cephosd000 ~]# ceph -s
> cluster d7b33135-0940-4e48-8aa6-1d2026597c2f
> health HEALTH_WARN
> noscrub,nodeep-scrub,require_jewel_osds flag(s) set
> monmap e17: 3 mons at
> {hyperv029=10.4.36.179:6789/0,hyperv030=10.4.36.180:6789/0,hyperv031=10.4.36.181:6789/0}
> election epoch 29148, quorum 0,1,2 hyperv029,hyperv030,hyperv031
> fsmap e19041: 1/1/1 up {0=cephosd000=up:active}
> osdmap e462328: 624 osds: 624 up, 624 in
> flags noscrub,nodeep-scrub,require_jewel_osds
> pgmap v44458747: 42496 pgs, 6 pools, 924 TB data, 272 Mobjects
> 2154 TB used, 1791 TB / 3946 TB avail
> 42496 active+clean
> client io 86911 kB/s rd, 556 MB/s wr, 227 op/s rd, 303 op/s wr
>
> [root@cephosd000 ~]# ceph daemon /var/run/ceph/ceph-mds.cephosd000.asok ops
> {
> "ops": [],
> "num_ops": 0
> }
>
>
> The odd thing is that if in this state I restart the MDS, the client process
> wakes up and proceeds with its work without any errors. As if a request was
> lost and somehow retransmitted/restarted when the MDS got restarted and the
> fuse layer reconnected to it.
Interesting. A couple of ideas for more debugging:
* Next time you go through this process of restarting the MDS while
there is a stuck client, first increase the client's logging (ceph
daemon <path to /var/run/ceph/ceph-<id>.asok> config set debug_client
20"). Then we should get a clear sense of exactly what's happening on
the MDS restart that's enabling the client to proceed.
* When inspecting the client's "mds_sessions" output, also check the
"session ls" output on the MDS side to make sure the MDS and client
both agree that it has an open session.
John
>
> When I try to attach a gdb session to either of the client processes, gdb
> just hangs. However, right after the MDS restart gdb attaches to the
> process successfully, and shows that the getting stuck happened on closing
> of a file. In fact, it looks like both processes were trying to write to
> the same file opened with fopen("filename", "a") and close it:
>
> (gdb) where
> #0 0x00002aaaadc53abd in write () from /lib64/libc.so.6
> #1 0x00002aaaadbe2383 in _IO_new_file_write () from /lib64/libc.so.6
> #2 0x00002aaaadbe37ec in __GI__IO_do_write () from /lib64/libc.so.6
> #3 0x00002aaaadbe30e0 in __GI__IO_file_close_it () from /lib64/libc.so.6
> #4 0x00002aaaadbd7020 in fclose@@GLIBC_2.2.5 () from /lib64/libc.so.6
> ...
>
> It seems like the fuse client wasn't handling this case well, when two
> processes write to the same file and close it perhaps? This is just a
> speculation. Any ideas on how to proceed? Is there perhaps a known issue
> related to this?
>
> Thanks,
>
> Andras
> [email protected]
>
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com