Excellent info Anthony Many thanks Steven On Fri, 1 Aug 2025 at 09:29, Anthony D'Atri <a...@dreamsnake.net> wrote:
> > The servers are dedicated to Ceph > Yes, it is perhaps too much but my IT philosophy is "there is always room > for more RAM" as it usually helps running things faster > > > Unless you're a certain Sun model, but I digress... > > The $ spent on all that RAM would IMHO have been more effective choosing > NVMe SSDs instead of HDDs. And not having to pay for the HBA. > > > Now, since I have it, I would like to use it as efficiently as possible > > > That's what the autotuner is all about. > > > The 3 NVMEs are 15TB dedicated to OSD - there are 2 more 1.6TB dedicated > to DB/WAL > HDD are 20TB and SSD are 7TB > > Is my understanding correct that autotune will dedicated 70% to OSDs > indiscriminately ??? > ... or there is some sort of algorithm for differentiating between the > disk type and size ? > > > There is not as far as I know, but honestly you're dramatically into the > territory where you have so much that there wouldn't be much to be gained > by customizing. Diminishing returns. > > > If NVME is SSD > > > NVMe is an interface not a medium. A SATA SSD and an NVMe SSD are the > same NAND (at similar cost) with a different interface. > > from autotune perspective, it would probably make sense to tune it > manually , no ? > > > If you like. See the below page for setting manually on a host-by-host or > per-OSD basis. Or disable it and do the math to divide it by device class > as you like, though note that if you add or remove OSDs from a given host > the system won't adjust without intervention. Or if you add systems less > expensively with less RAM and move some from these to them to even it out. > > Since you don't have much else contending for that RAM, you might > > ceph config set mgr/cephadm/autotune_memory_target_ratio 0.930000 > > which will let the autotuner use even more for OSDs. > > > How would I check status of autotune ...other than checking individual OSD > config ? > > > # ceph config dump | grep osd_memory_target > osd host:cephab92 basic > osd_memory_target 12083522051 > osd host:cephac0f basic > osd_memory_target 12083519715 > osd host:dd13-25 basic > osd_memory_target 6156072793 > osd host:dd13-29 basic > osd_memory_target 6235526712 > osd host:dd13-33 basic > osd_memory_target 6780274813 > osd host:dd13-37 basic > osd_memory_target 6601357610 > osd host:i18-24 basic > osd_memory_target 6670077087 > osd host:i18-28 basic > osd_memory_target 6600879861 > osd host:k18-23 basic > osd_memory_target 6663117330 > osd host:l18-24 basic > osd_memory_target 6822190406 > osd host:l18-28 basic > osd_memory_target 6782421978 > osd host:m18-33 basic > osd_memory_target 6593523272 > osd advanced > osd_memory_target_autotune true > > Here the first two hosts have much more RAM than the others, so the > autotuner has more to distribute. > > You can game the autotuner in various ways, see > https://www.ibm.com/docs/en/storage-ceph/8.0.0?topic=osds-automatically-tuning-osd-memory > > > https://docs.ceph.com/en/latest/rados/configuration/ceph-conf/#sections-and-masks > > Many thanks > > Steven > > On Thu, 31 Jul 2025 at 10:43, Anthony D'Atri <a...@dreamsnake.net> wrote: > >> IMHO the autotuner is awesome. >> >> 1TB of RAM is an embarrassment of riches -- are these hosts perhaps >> converged compute+storage? >> >> >> >> > On Jul 31, 2025, at 10:17 AM, Steven Vacaroaia <ste...@gmail.com> >> wrote: >> > >> > Hi >> > >> > What is the best practice / your expert advice about using >> > osd_memory_target_autotune >> > on hosts with lots of RAM ? >> > >> > My hosts have 1 TB RAM , only 3 NVMEs , 12 HDD and 12 SSD >> >> Remember that NVMe devices *are* SSDs ;) I'm guessing those are used for >> WAL+DB offload, and thus you have 24x OSDs per host? >> >> > Should I disable autotune and allocate more RAM? >> >> The autotuner by default will divide 70% of physmem across all the OSDs >> it finds on a given host, with 30% allocated for the OS and other daemons. >> I *think* any RGWs, mons, etc. are assumed to be part of that 30% but am >> not positive. >> >> > >> > I saw some suggestion for 16GB to NVME , 8GB to SSD and 6 to HDD >> >> I personally have a growing sense that more RAM actually can help slower >> OSDs more, at least with respect to rebalancing without rampant slow ops. >> ymmv. >> >> This implies that your NVMe devices are standalone OSDs, so that would >> mean 27 OSDs per node? I'm curious what manner of chassis this is. >> >> I then would think that the autotuner would set ~~ 26TB to >> osd_memory_size, which is ample by any measure. ~307TB will be available >> for non-OSD processes. >> >> >> If you're running compute or other significant non-Ceph workloads on the >> same nodes, you can adjust the reservation factor by setting ceph config >> set mgr mgr/cephadm/autotune_memory_target_ratio xxx. So if you want to >> reserve less for non-OSD processes, something like >> >> ceph config set mgr mgr/cephadm/autotune_memory_target_ratio 0.1 >> >> If yo do have hungry compute colocated, a good value might be something >> like 0.25, which would give each OSD > 9GB for osd_memory_target. If you >> do want to allot different amounts to different device classes, you can >> instead set static values, using central config device class masks. >> >> >> >> > >> > Many thanks >> > Steven >> > _______________________________________________ >> > ceph-users mailing list -- ceph-users@ceph.io >> > To unsubscribe send an email to ceph-users-le...@ceph.io >> >> > _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io