[ceph-users] docker restarting lost all managers accidentally

Ben Wed, 10 May 2023 08:09:23 -0700

Hi,
This cluster is deployed by cephadm 17.2.5,containerized.
It ends up in this(no active mgr):
[root@8cd2c0657c77 /]# ceph -s
  cluster:
    id:     ad3a132e-e9ee-11ed-8a19-043f72fb8bf9
    health: HEALTH_WARN
            6 hosts fail cephadm check
            no active mgr
            1/3 mons down, quorum h18w,h19w
            Degraded data redundancy: 781908/2345724 objects degraded
(33.333%), 101 pgs degraded, 209 pgs undersized


  services:
    mon: 3 daemons, quorum h18w,h19w (age 19m), out of quorum: h15w
    mgr: no daemons active (since 5h)
    mds: 1/1 daemons up, 1 standby
    osd: 9 osds: 6 up (since 5h), 6 in (since 5h)
    rgw: 2 daemons active (2 hosts, 1 zones)

  data:
    volumes: 1/1 healthy
    pools:   8 pools, 209 pgs
    objects: 781.91k objects, 152 GiB
    usage:   312 GiB used, 54 TiB / 55 TiB avail
    pgs:     781908/2345724 objects degraded (33.333%)
             108 active+undersized
             101 active+undersized+degraded

I checked the h20w, there is a manager container running with log:

debug 2023-05-10T12:43:23.315+0000 7f5e152ec000  0 monclient(hunting):
authenticate timed out after 300

debug 2023-05-10T12:48:23.318+0000 7f5e152ec000  0 monclient(hunting):
authenticate timed out after 300

debug 2023-05-10T12:53:23.318+0000 7f5e152ec000  0 monclient(hunting):
authenticate timed out after 300

debug 2023-05-10T12:58:23.319+0000 7f5e152ec000  0 monclient(hunting):
authenticate timed out after 300

debug 2023-05-10T13:03:23.319+0000 7f5e152ec000  0 monclient(hunting):
authenticate timed out after 300

debug 2023-05-10T13:08:23.319+0000 7f5e152ec000  0 monclient(hunting):
authenticate timed out after 300

debug 2023-05-10T13:13:23.319+0000 7f5e152ec000  0 monclient(hunting):
authenticate timed out after 300


any idea to get a mgr up running again through cephadm?

Thanks,
Ben
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] docker restarting lost all managers accidentally

Reply via email to