On Thu, Mar 12, 2026 at 10:09:10AM -0700, Nhat Pham wrote:
> On Wed, Mar 11, 2026 at 9:01 PM Li Wang <[email protected]> wrote:
> >
> > On Wed, Mar 11, 2026 at 11:50:05AM -0700, Yosry Ahmed wrote:
> > > On Wed, Mar 11, 2026 at 4:05 AM Li Wang <[email protected]> wrote:
> > > >
> > > > test_swapin_nozswap can hit OOM before reaching its assertions on some
> > > > setups. The test currently sets memory.max=8M and then allocates/reads
> > > > 32M with memory.zswap.max=0, which may over-constrain reclaim and kill
> > > > the workload process.
> > > >
> > > > Raise memory.max to 24M so the workload can make forward progress, and
> > > > lower the swap_peak expectation from 24M to 8M to keep the check robust
> > > > across environments.
> > > >
> > > > The test intent is unchanged: verify that swapping happens while zswap
> > > > remains unused when memory.zswap.max=0.
> > > >
> > > > === Error Logs ===
> > > >
> > > >   # ./test_zswap
> > > >   TAP version 13
> > > >   1..7
> > > >   ok 1 test_zswap_usage
> > > >   not ok 2 test_swapin_nozswap
> > > >   ...
> > > >
> > > >   # dmesg
> > > >   [271641.879153] test_zswap invoked oom-killer: 
> > > > gfp_mask=0xcc0(GFP_KERNEL), order=0, oom_score_adj=0
> > > >   [271641.879168] CPU: 1 UID: 0 PID: 177372 Comm: test_zswap Kdump: 
> > > > loaded Not tainted 6.12.0-211.el10.ppc64le #1 VOLUNTARY
> > > >   [271641.879171] Hardware name: IBM,9009-41A POWER9 (architected) 
> > > > 0x4e0202 0xf000005 of:IBM,FW940.02 (UL940_041) hv:phyp pSeries
> > > >   [271641.879173] Call Trace:
> > > >   [271641.879174] [c00000037540f730] [c00000000127ec44] 
> > > > dump_stack_lvl+0x88/0xc4 (unreliable)
> > > >   [271641.879184] [c00000037540f760] [c0000000005cc594] 
> > > > dump_header+0x5c/0x1e4
> > > >   [271641.879188] [c00000037540f7e0] [c0000000005cb464] 
> > > > oom_kill_process+0x324/0x3b0
> > > >   [271641.879192] [c00000037540f860] [c0000000005cbe48] 
> > > > out_of_memory+0x118/0x420
> > > >   [271641.879196] [c00000037540f8f0] [c00000000070d8ec] 
> > > > mem_cgroup_out_of_memory+0x18c/0x1b0
> > > >   [271641.879200] [c00000037540f990] [c000000000713888] 
> > > > try_charge_memcg+0x598/0x890
> > > >   [271641.879204] [c00000037540fa70] [c000000000713dbc] 
> > > > charge_memcg+0x5c/0x110
> > > >   [271641.879207] [c00000037540faa0] [c0000000007159f8] 
> > > > __mem_cgroup_charge+0x48/0x120
> > > >   [271641.879211] [c00000037540fae0] [c000000000641914] 
> > > > alloc_anon_folio+0x2b4/0x5a0
> > > >   [271641.879215] [c00000037540fb60] [c000000000641d58] 
> > > > do_anonymous_page+0x158/0x6b0
> > > >   [271641.879218] [c00000037540fbd0] [c000000000642f8c] 
> > > > __handle_mm_fault+0x4bc/0x910
> > > >   [271641.879221] [c00000037540fcf0] [c000000000643500] 
> > > > handle_mm_fault+0x120/0x3c0
> > > >   [271641.879224] [c00000037540fd40] [c00000000014bba0] 
> > > > ___do_page_fault+0x1c0/0x980
> > > >   [271641.879228] [c00000037540fdf0] [c00000000014c44c] 
> > > > hash__do_page_fault+0x2c/0xc0
> > > >   [271641.879232] [c00000037540fe20] [c0000000001565d8] 
> > > > do_hash_fault+0x128/0x1d0
> > > >   [271641.879236] [c00000037540fe50] [c000000000008be0] 
> > > > data_access_common_virt+0x210/0x220
> > > >   [271641.879548] Tasks state (memory values in pages):
> > > >   ...
> > > >   [271641.879550] [  pid  ]   uid  tgid total_vm      rss rss_anon 
> > > > rss_file rss_shmem pgtables_bytes swapents oom_score_adj name
> > > >   [271641.879555] [ 177372]     0 177372      571        0        0     
> > > >    0         0    51200       96             0 test_zswap
> > > >   [271641.879562] 
> > > > oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=/,mems_allowed=0,oom_memcg=/no_zswap_test,task_memcg=/no_zswap_test,task=test_zswap,pid=177372,uid=0
> > > >   [271641.879578] Memory cgroup out of memory: Killed process 177372 
> > > > (test_zswap) total-vm:36544kB, anon-rss:0kB, file-rss:0kB, 
> > > > shmem-rss:0kB, UID:0 pgtables:50kB oom_score_adj:0
> > >
> > > Why are we getting an OOM kill when there's a swap device? Is the
> > > device slow / not keeping up with reclaim pace?
> >
> > This is a good question. The OOM is triggered very likely because memcg
> > reclaim can't make forward progress fast enough within the retry budget
> > of try_charge_memcg.
> >
> > Looking at the OOM info, the system has 64K pages, so memory.max=8M gives
> > only 128 pages. At OOM time, RSS is 0 and swapents is only 96. Swap space
> > itself isn't full, the charge path simply gave up trying to reclaim.
> >
> > The core issue, I guess, is that with memory.zswap.max=0, every page
> > reclaimed must go through the real block device. The charge path works
> > like this: a page fault fires, charge_memcg tries to charge 64K to the
> > cgroup, the cgroup is at its limit, so try_charge_memcg attempts direct
> > reclaim to free space. If the swap device can't drain pages fast enough,
> > the reclaim attempts within the retry loop fail to bring usage below
> > memory.max, and the kernel invokes OOM, even though swap space is
> > technically available.
> >
> > Raising memory.max to 24M gives reclaim a much larger pool to work with,
> > so it can absorb I/O latency without exhausting its retry budget.
> 
> Hmmm, perhaps we should change all these constants to multiples of
> base page size of a system?

Yeah, this may better, let me try it in next version.

-- 
Regards,
Li Wang


Reply via email to