[ceph-users] Re: How important is the "default" data pool being replicated for CephFS

Eugen Block Fri, 12 Sep 2025 05:05:42 -0700

Hi,

just a few comments inline.


Zitat von Anthony D'Atri <anthony.da...@gmail.com>:

I held off replying here hoping that someone more authoritativewould step in, but I have a few thoughts that might help orstimulate conversation.


Same here.

The recommendations for cephfs is to make a replicated default data pool,
and adding any EC data pools using layouts:
https://docs.ceph.com/en/latest/cephfs/createfs/
That's my understanding too. AIUI one reason is that head RADOSobjects or some analogue thereof always live there, so there's asignificant performance benefit.
I have an cephfs that unfortunately wasn't set up like this: they just made
an EC pool on the slow HDDs as the default, which sounds like the worst
case scenario to me.
It could be worse - those slow HDDs could be attached via USB 1 ;)I've seen it done.
I would like to add an NVMe data pool to this ceph fs,
but recommended gives me pause on if i should instead go through the hassle
of creating a new cephfs and migrating all users.
That wouldn't be a horrible idea. My understanding, which may beincomplete, is that one can't factor out and replace thedefault/root data pool.


That's my understanding as well.

Something you could do easily would be to edit the CRUSH rule thepool is using to specify the nvme/ssd device class, and the poolwill migrate. upmap-remapped.py could be used to moderate thethundering herd. EC still wouldn't be ideal, but this would limitclient disruption.

I've tried to run some mdtest with small 1k files to see if i could measure
this difference, but speed is about the same in my relatively small tests
so far. I'm also not sure what impact I should realistically expect here. I
don't even know if creating files counts as "updating backtraces", so my
testing might just be pointless.

Are you running with a large number of files for an extended periodof time? From multiple clients? Gotta eliminate any cache effects.


I guess my core question is; just how important is this suggestion to keep
the default data pool on replicated NVME?

Setup:
14 hosts x 42 HDD + 3 NVMEs for db/wal  2*2x25 GbitE bonds
12 hosts x 10 NVME. 2*2x100 GbitE bonds

Old CephFS setup:
- metadata: replicated NVME
- data-pools: EC 10+2 on HDD  (i plan to add a EC NVME pool here via
layouts)

New CephFS setup as recommended:
- metadata: replicated NVME
- data-pools: replicated NVME (default), EC 8+2 on HDD via layout, EC 8+2
on NVME via layout.


Glad to see that you aren't making k+m = the number of hosts.


Ceph 18.2.7


Best regards, Mikael
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: How important is the "default" data pool being replicated for CephFS

Reply via email to