On Wed, Sep 14, 2016 at 03:48:44PM -0400, Johannes Weiner wrote:
> From: Johannes Weiner <jwei...@fb.com>
> During cgroup2 rollout into production, we started encountering css
> refcount underflows and css access crashes in the memory controller.
> Splitting the heavily shared css reference counter into logical users
> narrowed the imbalance down to the cgroup2 socket memory accounting.
> The problem turns out to be the per-cpu charge cache. Cgroup1 had a
> separate socket counter, but the new cgroup2 socket accounting goes
> through the common charge path that uses a shared per-cpu cache for
> all memory that is being tracked. Those caches are safe against
> scheduling preemption, but not against interrupts - such as the newly
> added packet receive path. When cache draining is interrupted by
> network RX taking pages out of the cache, the resuming drain operation
> will put references of in-use pages, thus causing the imbalance.
> Disable IRQs during all per-cpu charge cache operations.
> Fixes: f7e1cb6ec51b ("mm: memcontrol: account socket memory in unified
> hierarchy memory controller")
> Cc: <sta...@vger.kernel.org> # 4.5+
> Signed-off-by: Johannes Weiner <han...@cmpxchg.org>
Acked-by: Vladimir Davydov <vdavydov....@gmail.com>