Multiple filesystems (or volumes) can be the right choice, it really depends. But you need to be aware that for each CephFS you need (at leats) two pools, plus one standby daemon for each active daemon. While for a single FS (multi-active) it could be sufficient to have one or two standby daemons in total because they automatically take over the failed rank. As an example, you have 8 filesystems, that means you need at least 16 pools (maybe more if you want to use EC) which can be limited by the number of OSDs you have available. Then you also need 16 MDS daemons (one active, one standby for each FS). In a single FS scenario with 8 active MDS daemons it could be sufficient to have 9 or 10 daemons in total, and you need fewer pools.

If your setup is rather static and you don't have to create a new FS every other week, and the total number of filesystems stays the same, it might be the better approach for you.

So I can't really recommend anything, you'll need to figure out which scenario you need to cover. But Ceph is quite flexible, so you can just start at one point and then develop from there.

Zitat von Sophonet <[email protected]>:

Hi,

thanks for the information - it seems that with pinning of subvolumes/directories you can distribute the load to different MDS. But in that case, what would be the difference to setting up different top-level volumes and attach them to different MDS? What I am not clear about is whether setting up one fs volume and pin subvolumes to different MDS is basically equivalent to using multiple fs volumes and attaching them to different MDS. Quotas/auth caps etc. can both be set for volumes as well as subvolumes.

The only recommendation I have found on [0] says

„...it is recommended to consolidate file system workloads onto a single CephFS file system, when possible. Consolidate the workloads to avoid over-allocating resources to MDS servers that can be underutilized.“

Is there a workload difference when using multiple fs volumes vs. a single one and subvolumes? Intuitively I would think that multiple fs volumes might provide some more error resilience in case of failures - in which case only one fs (of several) would fail instead of the whole cluster (if there is just a single volume and subvolumes are used).

Any insights? Thanks,

Sophonet

[0] https://www.ibm.com/docs/en/storage-ceph/8.1.0?topic=systems-cephfs-volumes-subvolumes-subvolume-groups

Am 23.09.2025 um 15:45 schrieb Eugen Block <[email protected]>:

Hi,

with multiple active MDS daemons you can use pinning. This allows you to pin specific directories (or subvolumes) to a specific rank to spread the load. You can find the relevant docs here [0].

Note that during an upgrade, max_mds is reduced to 1 (automatically if you use the orchestrator), which can have a significant impact because all the load previously spreaded across multiple daemons is now shuffled onto a single node. This can crash a file system, just so you're aware.

So there are several options, two or three "fat" MDS nodes in active/standby mode which can handle all the load. Or you have more "fat" nodes which could handle all the load during an upgrade, spreading the load again after the upgrade is finished. Or you have multiple "not so fat" nodes to spread the workload but with a higher risk of an issue during an upgrade.

Regards,
Eugen

[0] https://docs.ceph.com/en/latest/cephfs/multimds/
[1] https://docs.ceph.com/en/latest/cephfs/upgrading/#upgrading-the-mds-cluster

Zitat von Sophonet <[email protected]>:

Hi list,

for multiple project-level file shares (with individual access rights) I am planning to use CephFS.

Technically this can be implemented both with multiple toplevel cephfs or with a single cephfs in the cluster and subvolumes.

What is the preferred choice? I have not found any guidance in docs.ceph.com <http://docs.ceph.com/>. The only location that suggests to use subvolumes is https://www.ibm.com/docs/en/storage-ceph/8.1.0?topic=systems-cephfs-volumes-subvolumes-subvolume-groups. However, how can I avoid that only one MDS is responsible for serving all subvolumes? Is there some current literature (books or web docs) that contain recommendations and examples? A couple of ceph-related books are available in well-known online book stores, but many of them are rather old (6 years or even more).

Thanks a lot,

Sophonet

_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]


_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]


_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to