Re: RFC: How ZFS handles arc memory use

Alexander Motin Wed, 22 Oct 2025 08:06:41 -0700

Hi Rick,

On 22.10.2025 10:34, Rick Macklem wrote:

A couple of people have reported problems with NFS servers,
where essentially all of the system's memory gets exhausted.
They see the problem on 14.n FreeBSD servers (which use the
newer ZFS code) but not on 13.n servers.


I am trying to learn how ZFS handles arc memory use to try
and figure out what can be done about this problem.

I know nothing about ZFS internals or UMA(9) internals,
so I could be way off, but here is what I think is happening.
(Please correct me on this.)

The L1ARC uses uma_zalloc_arg()/uma_zfree_arg() to allocate
the arc memory. The zones are created using uma_zcreate(),
so they are regular zones. This means the pages are coming
from a slab in a keg, which are wired pages.

The only time the size of the slab/keg will be reduced by ZFS
is when it calls uma_zone_reclaim(.., UMA_RECLAIM_DRAIN),
which is called by arc_reap_cb(), triggered by arc_reap_cb_check().

arc_reap_cb_check() uses arc_available_memory() and triggers
arc_reap_cb() when arc_available_memory() returns a negative
value.

arc_available_memory() returns a negative value when
zfs_arc_free_target (vfs.zfs.arc.free_target) is greater than freemem.
(By default, zfs_arc_free_target is set to vm_cnt.v_free_taget.)

Does all of the above sound about right?

There are two mechanisms to reduce ARC size: either from ZFS side in theway you described, or from kernel side, when it calls ZFS low memoryhandler arc_lowmem(). It feels somewhat overkill, but it came this wayfrom Solaris.

Once ARC size is reduced and evictions into UMA caches happened, it isup to UMA how to drain its caches. ZFS might trigger that itself, or itcan be done by kernel, or few years back I've added a mechanism for UMAcaches to slowly shrink by themselves even without pressure.

This leads me to...
- zfs_arc_free_target (vfs.zfs.arc.free_target) needs to be larger

There is a very delicate balance between ZFS and kernel(zfs_arc_free_target = vm_cnt.v_free_target). Imbalance there makes oneof them suffer.

or
- Most of the wired pages in the slab are per-cpu,
   so the uma_zone_reclaim() needs to UMA_RECLAIM_DRAIN_CPU
   on some systems. (Not the small test systems I have, where I
   cannot reproduce the problem.)

Per-CPU caches should be relatively small. IIRC in dozens or hundreds ofallocations per CPU. Their drain is expensive and should rarely beneeded, unless you have too little RAM for the number of CPUs you have.

or
- uma_zone_reclaim() needs to be called under other
   circumstances.
or
- ???

How can you tell if a keg/slab is per-cpu?
(For my simple test system, I only see "UMA Slabs 0:" and
"UMA Slabs 1:". It looks like UMA Slabs 0: is being used for
ZFS arc allocation for this simple test system.)

Hopefully folk who understand ZFS arc allocation or UMA
can jump in and help out, rick

Before you dive into UMA, have you checked whether ARC size reallyshrinks and eviction happens? Considering you mention NFS, I wonderwhat is your number of open files? Too many open files might in somecases restrict ZFS ability to evict metadata from ARC. arc_summary maygive some insights about ARC state.


--
Alexander Motin

Re: RFC: How ZFS handles arc memory use

Reply via email to