On Sun, Jun 22, 2025, 8:52 AM Hector Martin <mar...@marcan.st> wrote:
> I believe that > something is wrong with the OSD bluestore cache allocation/flush policy, > and when the cache becomes full it starts thrashing reads instead of > evicting colder cached data (or perhaps some cache bucket is starving > another cache bucket of space). > > I would appreciate some hints on how to debug this. Are there any cache > stats I should be looking at, or info on how the cache is partitioned? > For maybe a quick fix or lead, I'd start by disabling bluefs_buffered_io. This tunable has a strange history of being turned in and off againin Ceph's past with what I believe you've been describing as the reason. If anything, it may provide credence to what you're seeing. If terms of getting stats of individual memory pools, you can tell the OSD to dump of stats and from that you discern memory allocations of individual pools/use cases which might help for the situation you're describing given there are so many components and which make up an OSD (rockdbs, etc.) and subsystems that tend to hold on to memory for their respective use cases (i.e. they hold onto the memory in local free list slabs and don't return it to tcmalloc necessarily). I've personally also had problems with the kernel swapping out to disk when I had plenty of free memory, only to realize it was certain NUMA zones that were causing the effect. I have no clue if Apple Silicon has NUMA zones, but also a variable you can control for by pinning OSDs to specific NUMA zones for a bit if it does. Cheers, Tyler > _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io