My practical suggestion would be to do nothing for now (perhaps tweaking
the config settings to shut up the warnings about PGs per OSD). Ceph
will gain the ability to downsize pools soon, and in the meantime,
anecdotally, I have a production cluster where we overshot the current
recommendation by 10x due to confusing documentation at the time, and
it's doing fine :-)

Stable multi-FS support is also coming, so really, multiple ways to fix
your problem will probably materialize Real Soon Now, and in the
meantime having more PGs than recommended isn't the end of the world.

(resending because the previous reply wound up off-list)

On 09/02/2019 10.39, Brian Topping wrote:
> Thanks again to Jan, Burkhard, Marc and Hector for responses on this. To
> review, I am removing OSDs from a small cluster and running up against
> the “too many PGs per OSD problem due to lack of clarity. Here’s a
> summary of what I have collected on it:
>  1. The CephFS data pool can’t be changed, only added to. 
>  2. CephFS metadata pool might be rebuildable
>     via, but the
>     post is a couple of years old, and even then, the author stated that
>     he wouldn’t do this unless it was an emergency.
>  3. Running multiple clusters on the same hardware is deprecated, so
>     there’s no way to make a new cluster with properly-sized pools and
>     cpio across.
>  4. Running multiple filesystems on the same hardware is considered
>     experimental: 
>     It’s unclear what permanent changes this will effect on the cluster
>     that I’d like to use moving forward. This would be a second option
>     to mount and cpio across.
>  5. Importing pools (ie `zpool export …`, `zpool import …`) from other
>     clusters is likely not supported, so even if I created a new cluster
>     on a different machine, getting the pools back in the original
>     cluster is fraught.
>  6. There’s really no way to tell Ceph where to put pools, so when the
>     new drives are added to CRUSH, everything starts rebalancing unless
>     `max pg per osd` is set to some small number that is already
>     exceeded. But if I start copying data to the new pool, doesn’t it fail?
>  7. Maybe the former problem can be avoided by changing the weights of
>     the OSDs...
> All these options so far seem either a) dangerous or b) like I’m going
> to have a less-than-pristine cluster to kick off the next ten years
> with. Unless I am mistaken in that, the only options are to copy
> everything at least once or twice more:
>  1. Copy everything back off CephFS to a `mdadm` RAID 1 with two of the
>     6TB drives. Blow away the cluster and start over with the other two
>     drives, copy everything back to CephFS, then re-add the freed drive
>     used as a store. Might be done by the end of next week.
>  2. Create a new, properly sized cluster on a second machine, copy
>     everything over ethernet, then move the drives and the
>     `/var/lib/ceph` and `/etc/ceph` back to the cluster seed.
> I appreciate small clusters are not the target use case of Ceph, but
> everyone has to start somewhere!
> _______________________________________________
> ceph-users mailing list

Hector Martin (
Public Key:
ceph-users mailing list

Reply via email to