Re: [ceph-users] I/O hangs when one of three nodes is down

Grigori Frolov Thu, 07 Jun 2018 07:13:28 -0700

Thank you, Burkhard. There are really 3 active MDSs, so this is a 
misconfiguration.
I will try a standby one.

kind regards, Grigori.

________________________________________
От: ceph-users <[email protected]> от имени Burkhard Linke 
<[email protected]>
Отправлено: 7 июня 2018 г. 18:59
Кому: [email protected]
Тема: Re: [ceph-users] I/O hangs when one of three nodes is down

Hi,

On 06/07/2018 02:52 PM, Фролов Григорий wrote:
> ?Hello. Could you please help me troubleshoot the issue.
>
> I have 3 nodes in a cluster.
*snipsnap*

> root@testk8s2:~# ceph -s
>      cluster 0bcc00ec-731a-4734-8d76-599f70f06209
>       health HEALTH_ERR
>              80 pgs degraded
>              80 pgs stuck degraded
>              80 pgs stuck unclean
>              80 pgs stuck undersized
>              80 pgs undersized
>              recovery 1075/3225 objects degraded (33.333%)
>              mds rank 2 has failed
>              mds cluster is degraded
>              1 mons down, quorum 1,2 testk8s2,testk8s3
>       monmap e1: 3 mons at 
> {testk8s1=10.105.6.116:6789/0,testk8s2=10.105.6.117:6789/0,testk8s3=10.105.6.118:6789/0}
>              election epoch 120, quorum 1,2 testk8s2,testk8s3
>        fsmap e14084: 2/3/3 up {0=testk8s2=up:active,1=testk8s3=up:active}, 1 
> failed
>       osdmap e9939: 3 osds: 2 up, 2 in; 80 remapped pgs
>              flags sortbitwise,require_jewel_osds
>        pgmap v17491: 80 pgs, 3 pools, 194 MB data, 1075 objects
>              1530 MB used, 16878 MB / 18408 MB avail
>              1075/3225 objects degraded (33.333%)
>                    80 active+undersized+degraded
I assume all your MDS servers are active MDS. In this setup the
filesystem metadata is shared between the hosts. If one of the MDS is
not available, the part of the filesystem served by that MDS is not
accessible.

You can prevent this kind of lock up by using standby MDS server that
will become active as soon as one of the active MDS server fails.

To keep the failover time as low as possible, you can configure a
standby MDS to be associated with a running active MDS. You would
require one standby MDS for each active MDS, but failover time would be
minimal. An unassociated MDS can replace any failed active MDS, but need
to load its inode cache before becoming active. This may take some time.

Regards,
Burkhard
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] I/O hangs when one of three nodes is down

Reply via email to