On Wed, Sep 14, 2016 at 03:48:44PM -0400, Johannes Weiner wrote: > From: Johannes Weiner <jwei...@fb.com> > > During cgroup2 rollout into production, we started encountering css > refcount underflows and css access crashes in the memory controller. > Splitting the heavily shared css reference counter into logical users > narrowed the imbalance down to the cgroup2 socket memory accounting. > > The problem turns out to be the per-cpu charge cache. Cgroup1 had a > separate socket counter, but the new cgroup2 socket accounting goes > through the common charge path that uses a shared per-cpu cache for > all memory that is being tracked. Those caches are safe against > scheduling preemption, but not against interrupts - such as the newly > added packet receive path. When cache draining is interrupted by > network RX taking pages out of the cache, the resuming drain operation > will put references of in-use pages, thus causing the imbalance. > > Disable IRQs during all per-cpu charge cache operations. > > Fixes: f7e1cb6ec51b ("mm: memcontrol: account socket memory in unified > hierarchy memory controller") > Cc: <sta...@vger.kernel.org> # 4.5+ > Signed-off-by: Johannes Weiner <han...@cmpxchg.org>
Acked-by: Vladimir Davydov <vdavydov....@gmail.com>