[ceph-users] Re: [cephadm] Questions regarding cephadm infra-as-code philosophy, containerization, and trixie availability

Florent Carli Thu, 15 May 2025 23:54:02 -0700

> Once you have a cluster bootstrapped, it can be 100% declarative.
> There are various CLI commands so you can perform various tasks
> surgically, but it’s also entirely possible to maintain almost the entire
> cluster state in a YAML file:
>
>   ceph orch ls —export > myawesomecluster.yaml
>
>   # edit the file with your favorite emacs
>
>   ceph orch apply -i myawesomecluster.yaml —dry-run
>   ceph orch apply -i myawesomecluster.yaml
>
>
> This is the best of both worlds: the YAML file readily fits into revision
> control and peer review, and one can add commit hooks to validate syntax
> or to even perform a dry run for a sanity check.


Thanks, but this raises a lot of questions :)

Running ceph orch ls --export gives me:

service_type: mgr
service_name: mgr
placement:
  count: 3
---
service_type: mon
service_name: mon
placement:
  count: 3
---
service_type: osd
service_name: osd
unmanaged: true
spec:
  filter_logic: AND
  objectstore: bluestore

Very light isn't it?

One use case that gave me a headache was losing a node (assume the
hardware is completely dead). I was able to remove it from the cluster
easily using:

ceph orch host rm nodeX --force --offline

After that, I want my playbook to detect that the cluster is already
bootstrapped and then deploy everything needed on the replacement
node—whichever node that may be. With ceph-ansible, I could simply run
the site.yml playbook without giving it much thought.

With cephadm, however, I feel like I have to handle all the logic
manually. For instance, if the first node I try isn't in the cluster,
I need to determine whether it's because the node was lost or because
the cluster hasn't been bootstrapped yet.

Then there’s the OSDs. The output of ceph orch ls doesn’t show where
the OSDs are or what devices they're using. I also read somewhere that
it's normal cephadm consideres them as "unmanaged". In my setup, I use
pre-provisioned LVM volumes (e.g., vg_ceph/lv_ceph). My use case
requires detecting whether all necessary OSDs are deployed. If not, I
want to zap and redeploy the missing ones.

I don’t see how ceph orch apply can do this.

So, for now, I’m stuck handling all of this myself. You can take a
look at what I’ve done here if you're interested:
https://github.com/seapath/ansible/blob/main/roles/cephadm/tasks/main.yml
My playbook deals with all those use cases based on my ansible inventory:
- boostrapping a new cluster from scratch
- adding a missing node (monitor) in an already boostrapped cluster
- adding missing mgrs
- zapping and adding missing OSDs (based on pre-provisionned LVM volumes)
- of course: doing nothing if everything is already in place
But it's such a pain to maintain.


> It depends a bit on what you're doing with these modules. Are you using them
> interactively in a python prompt or are you building applications on top of
> them?

Indeed, I'm talking about a client application for tooling. My use
case is a python script, that will do some stuff with ceph/rbd and
some stuff with pacemaker and act as a wrapper for our admins. For
this it using the ceph and pacemaker python modules directly on the
system, which require the associated libraries.... which requires to
install the associated packages: I need ceph-common because I need
python3-ceph-common and librbd1, etc... to do my "import rbd".

> Using that tool (ceph-bluestore-tool), to be fair, should be rare

I love this tool to extend an OSDs in seconds :) Once again, I run my
OSDs on LVM and so LVM (lvextend) + ceph-bluestore-tool gives me a lot
of flexibility with storage :)

> we do have a recommended process for this
> (https://docs.ceph.com/en/latest/cephadm/troubleshooting/#running-various-ceph-tools).
> The high level overview is you stop the daemon and then run `cephadm shell`
> with `--name <daemon-name>` and it should spin up a container with all the 
> same
> files and mounts as if we were actually running the daemon, but with an 
> interactive
> bash session inside instead of the actual daemon process running. I was just 
> messing
> with this for trying to add a wal device to an OSD earlier today (which 
> wasn't working
> for another reason related to ceph-volume, but the process for running the 
> tools in
> general worked)

Very nice !
I had my own way of hacking this ("cephadm shell -v /dev:/dev -m
/var/lib/ceph/e1887cc0-2988-11f0-a805-ead37406f331/osd.0/:/var/lib/ceph/osd/ceph-0/:z
-- ceph-bluestore-tool bluefs-bdev-expand --path
/var/lib/ceph/osd/ceph-0/"), but "--name osd.0" is so much nicer !
Thanks.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [cephadm] Questions regarding cephadm infra-as-code philosophy, containerization, and trixie availability

Reply via email to