On Mon, Mar 23, 2026 at 5:46 AM Li Wang <[email protected]> wrote:
>
> On Fri, Mar 20, 2026 at 04:42:35PM -0400, Waiman Long wrote:
> > The vmstats flush threshold currently increases linearly with the
> > number of online CPUs. As the number of CPUs increases over time, it
> > will become increasingly difficult to meet the threshold and update the
> > vmstats data in a timely manner. These days, systems with hundreds of
> > CPUs or even thousands of them are becoming more common.
> >
> > For example, the test_memcg_sock test of test_memcontrol always fails
> > when running on an arm64 system with 128 CPUs. It is because the
> > threshold is now 64*128 = 8192. With 4k page size, it needs changes in
> > 32 MB of memory. It will be even worse with larger page size like 64k.
> >
> > To make the output of memory.stat more correct, it is better to scale
> > up the threshold slower than linearly with the number of CPUs. The
> > int_sqrt() function is a good compromise as suggested by Li Wang [1].
> > An extra 2 is added to make sure that we will double the threshold for
> > a 2-core system. The increase will be slower after that.
> >
> > With the int_sqrt() scale, we can use the possibly larger
> > num_possible_cpus() instead of num_online_cpus() which may change at
> > run time.
> >
> > Although there is supposed to be a periodic and asynchronous flush of
> > vmstats every 2 seconds, the actual time lag between succesive runs
> > can actually vary quite a bit. In fact, I have seen time lags of up
> > to 10s of seconds in some cases. So we couldn't too rely on the hope
> > that there will be an asynchronous vmstats flush every 2 seconds. This
> > may be something we need to look into.
> >
> > [1] https://lore.kernel.org/lkml/[email protected]/
> >
> > Suggested-by: Li Wang <[email protected]>
> > Signed-off-by: Waiman Long <[email protected]>

What's the motivation for this fix? Is it purely to make tests more
reliable on systems with larger page sizes?

We need some performance tests to make sure we're not flushing too
eagerly with the sqrt scale imo. We need to make sure that when we
have a lot of cgroups and a lot of flushers we don't end up performing
worse.

Reply via email to