[ceph-users] Re: Debugging OSD cache thrashing

Tyler Stachecki Mon, 23 Jun 2025 03:41:18 -0700

On Sun, Jun 22, 2025, 8:52 AM Hector Martin <mar...@marcan.st> wrote:


> I believe that
> something is wrong with the OSD bluestore cache allocation/flush policy,
> and when the cache becomes full it starts thrashing reads instead of
> evicting colder cached data (or perhaps some cache bucket is starving
> another cache bucket of space).
>
> I would appreciate some hints on how to debug this. Are there any cache
> stats I should be looking at, or info on how the cache is partitioned?
>

For maybe a quick fix or lead, I'd start by disabling bluefs_buffered_io.
This tunable has a strange history of being turned in and off againin
Ceph's past with what I believe you've been describing as the reason. If
anything, it may provide credence to what you're seeing.

If terms of getting stats of individual memory pools, you can tell the OSD
to dump of stats and from that you discern memory allocations of individual
pools/use cases which might help for the situation you're describing given
there are so many components and which make up an OSD (rockdbs, etc.) and
subsystems that tend to hold on to memory for their respective use cases
(i.e. they hold onto the memory in local free list slabs and don't return
it to tcmalloc necessarily).

I've personally also had problems with the kernel swapping out to disk when
I had plenty of free memory, only to realize it was certain NUMA zones that
were causing the effect. I have no clue if Apple Silicon has NUMA zones,
but also a variable you can control for by pinning OSDs to specific NUMA
zones for a bit if it does.

Cheers,
Tyler

>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Debugging OSD cache thrashing

Reply via email to