On Tue, Jul 31, 2018 at 12:33 AM William Lawton
<[email protected]> wrote:
>
> Hi.
>
>
>
> We have recently setup our first ceph cluster (4 nodes) but our node failure
> tests have revealed an intermittent problem. When we take down a node (i.e.
> by powering it off) most of the time all clients reconnect to the cluster
> within milliseconds, but occasionally it can take them 30 seconds or more.
> All clients are Centos7 instances and have the ceph cluster mount point
> configured in /etc/fstab as follows:
The first thing I'd do is make sure you've got recent client code --
there are backports in RHEL but I'm unclear on how much of that (if
any) makes it into centos. You may find it simpler to just install a
recent 4.x kernel from ELRepo. Even if you don't want to use that in
production, it would be useful to try and isolate any CephFS client
issues you're encountering.
John
>
>
>
> 10.18.49.35:6789,10.18.49.204:6789,10.18.49.101:6789,10.18.49.183:6789:/
> /mnt/ceph ceph name=admin,secretfile=/etc/ceph_key,noatime,_netdev 0
> 2
>
>
>
> On rare occasions, using the ls command, we can see that a failover has left
> a client’s /mnt/ceph directory with the following state: “??????????? ? ?
> ? ? ? ceph”. When this occurs, we think that the client has
> failed to connect within 45 seconds (the mds_reconnect_timeout period) so the
> client has been evicted. We can reproduce this circumstance by reducing the
> mds reconnect timeout down to 1 second.
>
>
>
> We’d like to know why our clients sometimes struggle to reconnect after a
> cluster node failure and how to prevent this i.e. how can we ensure that all
> clients consistently reconnect to the cluster quickly following a node
> failure.
>
>
>
> We are using the default configuration options.
>
>
>
> Ceph Status:
>
>
>
> cluster:
>
> id: ea2d9095-3deb-4482-bf6c-23229c594da4
>
> health: HEALTH_OK
>
>
>
> services:
>
> mon: 4 daemons, quorum dub-ceph-01,dub-ceph-03,dub-ceph-04,dub-ceph-02
>
> mgr: dub-ceph-02(active), standbys: dub-ceph-04.ott.local, dub-ceph-01,
> dub-ceph-03
>
> mds: cephfs-1/1/1 up {0=dub-ceph-03=up:active}, 3 up:standby
>
> osd: 4 osds: 4 up, 4 in
>
>
>
> data:
>
> pools: 2 pools, 200 pgs
>
> objects: 2.36 k objects, 8.9 GiB
>
> usage: 31 GiB used, 1.9 TiB / 2.0 TiB avail
>
> pgs: 200 active+clean
>
>
>
> Thanks
>
> William Lawton
>
>
>
> _______________________________________________
> ceph-users mailing list
> [email protected]
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com