[ceph-users] Re: Ceph flash deployment

Alexander E. Patrakov Tue, 03 Nov 2020 11:26:26 -0800

On Tue, Nov 3, 2020 at 6:30 AM Seena Fallah <seenafal...@gmail.com> wrote:
>
> Hi all,
>
> Does this guid is still valid for a bluestore deployment with nautilus or
> octopus?
> https://tracker.ceph.com/projects/ceph/wiki/Tuning_for_All_Flash_Deployments


Some of the guidance is of course outdated.

E.g., at the time of that writing, 1x 40GbE was indeed state of the
art in the networking world, but now 100GbE network cards are
affordable, and with 6 NVMe drives per server, even that might be a
bottleneck if the clients use a large block size (>64KB) and do an
fsync() only at the end.

Regarding NUMA tuning, Ceph made some progress. If it finds that your
NVMe and your network card are on the same NUMA node, then, with
Nautilus or later, the OSD will pin itself to that NUMA node
automatically. I.e.: choose strategically which PCIe slots to use,
maybe use two network cards, and you will not have to do any tuning or
manual pinning.

Partitioning the NVMe was also a popular advice in the past, but now
that there are "osd op num shards" and "osd op num threads per shard"
parameters, with sensible default values, this is something that tends
not to help.

Filesystem considerations in that document obviously apply only to
Filestore, which is something you should not use.

Large PG number per OSD helps more uniform data distribution, but
actually hurts performance a little bit.

The advice regarding the "performance" cpufreq governor is valid, but
you might also look at (i.e. benchmark for your workload specifically)
disabling the deepest idle states.

-- 
Alexander E. Patrakov
CV: http://pc.cd/PLz7
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph flash deployment

Reply via email to