[ceph-users] Re: Integrating OSDs from a previous Ceph installation

Eugen Block via ceph-users Mon, 08 Jun 2026 12:50:25 -0700

Hi again,

I just wanted to conclude this thread. We managed to bring the OSDsback up and reactivate the CephFS so Jacek has access to his data again.

It would be too much to summarize all of it, but a couple of thingsare worth mentioning, at least for the curios readers here. ;-)

- After Jacek had bootstrapped a fresh cluster and tried to reactivethe existing OSDs, he accidentally had caused more inconsistencies.Due to mixup of cephadm and non-cephadm commands and procedures, therewas a lot of cleanup necessary.- Among others, removing ceph-osd package fixed at least one issue.But also disabling ceph-volume remainders (from manually usingceph-volume outside of cephadm) was necessary.- We managed to extract the mon store from the OSDs and brought backthe osdmap. But that wasn't enough.- We had to fix the directory content of the OSDs, the first oneactually started successfully, so we proceeded with the second. Andthen it happened again, all monitors crashed.- Shortly before the crash I had noticed two strange OSD keyrings (noidea how they got there). And as soon as we tried to start one ofthose OSDs with a strange keyring, the monitors failed. So apparently,the original issue (crashing monitors) was transported into the newcluster.- We stopped the OSDs, restarted monitors and removed the faulty keys.The monitors were stable again now.- We fixed the remaining OSD contents (keyrings, unit.run files etc.),now all OSDs got up successfully.- We had to deploy new MDS daemons (necessary after mon store loss)and then recreate the CephFS based on the existing metadata and datapools. On first glance, all files were present.

So in the end, the recovery was successful although all the cleanupand cluster bootstrap hadn't been necessary in retrospective. So myadvice is: first investigate logs to find out the root cause beforemaking such destructive decisions (wipe the cluster and rebuild). ButI consider it a good practice and a proof for Ceph's resiliency whenit comes to user errors. ;-)


Regards,
Eugen


Zitat von Eugen Block <[email protected]>:

We've continued this topic off list, it's way quicker that way. Dueto the different attempts to start the OSDs whithout the properpreparation (and by mixing cephadm with non-cephadm commands), therewere some remainders to clean up before making progress.In the meantime we were able to gather the monmap info from thefirst OSD, these steps will be necessary for the remaining OSDsbefore we'll be able proceed with activating them.


I will conclude this thread once we've accomplished that.

Zitat von Jacek Rużyczka <[email protected]>:

I've already tried that. No use. Cephadm has a problem with the hostname:

mixtile@blade3n1:~$ sudo ceph cephadm osd activate blade3n1
Error EIO: Module 'cephadm' has experienced an error and cannot handle
commands:
invalid literal for int() with base 10: 'blade3n1'

I know that one or two people have had this error so far, but I have not
found a remedy.

Neither did cephadm deploy work:

mixtile@blade3n1:~$ sudo cephadm deploy --osd-fsid
9f7fd40d-0698-40b9-8718-62942
b03e263 --name osd.blade3n1 --fsid 8aad3073-39a1-11f1-bf6e-f2704a1efa9b
--keyrin
g /var/lib/ceph/8aad3073-39a1-11f1-bf6e-f2704a1efa9b/osd.blade3n1/keyring
Deprecated command used: <function command_deploy at 0xffffa333cea0>
Non-zero exit code 1 from /usr/bin/docker container inspect --format
{{.State.St
atus}} ceph-8aad3073-39a1-11f1-bf6e-f2704a1efa9b-osd-blade3n1
/usr/bin/docker: stdout
/usr/bin/docker: stderr Error response from daemon: No such container:
ceph-8aad
3073-39a1-11f1-bf6e-f2704a1efa9b-osd-blade3n1
Non-zero exit code 1 from /usr/bin/docker container inspect --format
{{.State.St
atus}} ceph-8aad3073-39a1-11f1-bf6e-f2704a1efa9b-osd.blade3n1
/usr/bin/docker: stdout
/usr/bin/docker: stderr Error response from daemon: No such container:
ceph-8aad
3073-39a1-11f1-bf6e-f2704a1efa9b-osd.blade3n1
Deploy daemon osd.blade3n1 ...

Shouldn't it create the necessary container itself?

BTW, I've found out that the utilities ceph-osd, ceph-base, and ceph-volume
were installed for some reason. I removed them, but that didn't help me
either.



_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Re: Integrating OSDs from a previous Ceph installation

Reply via email to