I have two managers deployed, active-stdby, on a couple of the monitors. The 
managers were stopped with the orchestrator, and as expected, cannot be started 
without a manager running. I started a manager instance manually using 
‘systemctl start 
ceph-f...@mgr.host.xxx.service<mailto:ceph-f...@mgr.host.xxx.service>’ and the 
manager starts fine and is stable, but after a few seconds, is gracefully 
terminated, saying it received sigterm. I paused the orchestrator and started 
the manager again. Now it stays running. As soon as I resume the orchestrator, 
the manager is terminated. Why would the orchestrator be terminating managers 
constantly?

The configuration shows 2 managers should be running, not 0. This is what ‘ceph 
orch ls’ shows for the 10 seconds before the manager is terminated:
mgr                                                   1/2  16s ago    18h  
count:2

Cephadm logs show:
2021-07-28T18:03:53.508042+0000 mon.host1 [INF] Activating manager daemon 
host2.ijsxjg
2021-07-28T18:03:54.326785+0000 mon.host1 [INF] Health check cleared: MGR_DOWN 
(was: no active mgr)
2021-07-28T18:03:54.326941+0000 mon.host1 [INF] Cluster is now healthy
2021-07-28T18:03:54.381231+0000 mon.host1 [INF] Manager daemon host2.ijsxjg is 
now available
2021-07-28T18:04:38.354264+0000 mon.host1 [INF] Manager daemon host2.ijsxjg is 
unresponsive.  No standby daemons available.
2021-07-28T18:04:38.355071+0000 mon.host1 [WRN] Health check failed: no active 
mgr (MGR_DOWN)

Starting the other mgr gives same result.

Jim


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to