> Thus displacing the problem from developers to the admins as now admins have 
> to maintain the whole container support environment, including the "slightly 
> varying library packages etc." This whole exercise is pointless as Ceph 
> cluster storage nodes are natively dedicated to Ceph

There are LOTS of converged clusters, notably Proxmox and Rook within K8s.

> thus the bare-metal machine is the real Ceph container. Just install the 
> specific OS supported by Ceph

Fleets don’t care for that approach.  They don’t want to have to maintain 
automation and package and provisioning systems for multiple OSes, multiply the 
infosec burden, etc.

With Podman and log-to-file, in addition to the cephadm orchestrator, one can 
still manage individual daemons with systemctl.

> and you've got your Ceph "container" made out of metal.
> 
> You can make an argument that monitor and/or manager daemons could run in 
> containers to allow for migration to alternative nodes.

Indeed, this is a big advantage.

> Properly sized Ceph clusters come with a number of dedicated monitor nodes 
> anyway

In the Luminous days I felt that way, since mons were subtle and quick to 
anger.  I liked to decouple them from OSDs, which were more likely to need node 
reboots.  I no longer due.  I’ve had a pre-cephadm cluster with 5x mon/RGW 
nodes, two of which broke, and it took me *18 months* to convince the TAC to 
fix them.  The cluster was one more failure from an outage, where with a 
converged cluster the full quorum would be maintained automagically.  Defining 
the service spec to abstract based on a “mon” host label, and judiciously 
applying that label to one or at most two nodes per rack / failure domain 
maintains prudent anti-affinity.


> therefore migration usually means add a new monitor node to the cluster and 
> remove the old one, thus making the monitor containers point moot, too.

Adding a new node to the cluster is far from trivial when the DC is overseas 
and unstaffed. It would also mean having a warm / hot spare node, which with 
your reasoning would be a different spec than the OSD nodes.  The fewer chassis 
types in a DC the better, from an interchangeability, servicing, and sparing 
perspective.

And, yes, I’ve run fleets of as many as 15000 nodes. ymmv.

— aad
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to