Hi! I'm running a ceph cluster using cephadm and recently upgraded from squid to tentacle 20.2.0. Until recently everything worked fine until I started the nfs module. The nfs daemons were reported to be running but after some 10 minutes or so all of them were reported to be dead except one. nfs service on port 2049 was never provided on any of the nodes even while the daemons were supposed to be running. As i found out later, the nfs daemons were never started at all, because the setup process required a systemd-firewalld to be installed on the system which of course wasn't. After some headaches with the newly installed firewalld I decided to roll back, delete the firewalld and postpone the nfs deployment. I then tried to stop the nfs daemons with 'ceph orch daemon stop', which did nothing, even after waiting some 10 minutes. I had to reissue the command several times to make the reportedly dead nfs daemons vanish from the 'ceph orch ps' list. The one daemon that was reported to be still running however would only die after 'ceph orch daemon stop --force' and was in an 'error' state thereafter and could not be removed from the 'ceph orch ps' list by no means. So I decided to delete the managing nfs service from the 'ceph orch ls' list, in hope that it would also tear down the remaining nfs daemon. This obviously was a bad idea since, the service is now in the state of deleting. However it cannot be deleted, because there is still the one daemon in error state, which cannot be deleted because it was never running at all. As a last measure I forcefully removed the docker container on the node with the cephadm command, but even though there are no traces left of that nfs daemon, it is still listed when running 'ceph orch ps'. I also noticed that the 'ceph orch device ls' is out of sync with reality and 'ceph orch ps' is still listing osds that I've already shutdown and deleted. I therefore suspect, that the orchestrator has stopped collecting state information from the nodes. Is there a way to force the orchestrator to sync its state information with the nodes? Where do I find meaningful logs for the orchestrator?
With best regards, Carsten Götze _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
