On Mon, Apr 09, 2018 at 06:12:11PM -0700, Matthew Wilcox wrote:
> On Tue, Apr 10, 2018 at 08:04:09AM +0900, Minchan Kim wrote:
> > On Mon, Apr 09, 2018 at 08:20:32AM -0700, Matthew Wilcox wrote:
> > > I don't think this is something the radix tree should know about.
> > 
> > Because shadow entry implementation is hidden by radix tree implemetation.
> > IOW, radix tree user cannot know how it works.
> I have no idea what you mean.
> > > SLAB should be checking for it (the patch I posted earlier in this
> > 
> > I don't think it's right approach. SLAB constructor can initialize
> > some metadata for slab page populated as well as page zeroing.
> > However, __GFP_ZERO means only clearing pages, not metadata.
> > So it's different semantic. No need to mix out.
> No, __GFP_ZERO is specified to clear the allocated memory whether
> you're allocating from alloc_pages or from slab.  What makes no sense
> is allocating an object from slab with a constructor *and* __GFP_ZERO.
> They're in conflict, and slab can't fulfill both of those requirements.

It's a stable material. If you really think it does make sense,
please submit patch separately.

> > > thread), but the right place to filter this out is in the caller of
> > > radix_tree_maybe_preload -- it's already filtering out HIGHMEM pages,
> > > and should filter out GFP_ZERO too.
> > 
> > radix_tree_[maybe]_preload is exported API, which are error-prone
> > for out of modules or upcoming customers.
> > 
> > More proper place is __radix_tree_preload.
> I could not disagree with you more.  It is the responsibility of the
> callers of radix_tree_preload to avoid calling it with nonsense flags
> like __GFP_DMA, __GFP_HIGHMEM or __GFP_ZERO.

How about this?

It would fix current problem and warn potential bugs as well.
radix_tree_preload already has done such warning and
radix_tree_maybe_preload has skipping for misbehaivor gfp.

>From 27ecf7a009d3570d1155c528c7f08040ede68ed3 Mon Sep 17 00:00:00 2001
From: Minchan Kim <minc...@kernel.org>
Date: Tue, 10 Apr 2018 11:20:11 +0900
Subject: [PATCH v2] mm: workingset: fix NULL ptr dereference

It assumes shadow entries of radix tree rely on the init state
that node->private_list allocated newly is list_empty state
for the working. Currently, it's initailized in SLAB constructor
which means node of radix tree would be initialized only when
*slub allocates new page*, not *slub alloctes new object*.

If some FS or subsystem pass gfp_mask to __GFP_ZERO, that means
newly allocated node can have !list_empty(node->private_list)
by memset of slab allocator. It ends up calling NULL deference
at workingset_update_node by failing list_empty check.

This patch fixes it.

Fixes: 449dd6984d0e ("mm: keep page cache radix tree nodes in check")
Cc: Johannes Weiner <han...@cmpxchg.org>
Cc: Jan Kara <j...@suse.cz>
Cc: Matthew Wilcox <wi...@infradead.org>
Cc: Jaegeuk Kim <jaeg...@kernel.org>
Cc: Chao Yu <yuch...@huawei.com>
Cc: Christopher Lameter <c...@linux.com>
Cc: linux-fsde...@vger.kernel.org
Reported-by: Chris Fries <cfr...@google.com>
Signed-off-by: Minchan Kim <minc...@kernel.org>
 lib/radix-tree.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index da9e10c827df..9d68f2a7888e 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -511,6 +511,16 @@ int radix_tree_preload(gfp_t gfp_mask)
        /* Warn on non-sensical use... */
+       /*
+        * New allocate node must have node->private_list as INIT_LIST_HEAD
+        * state by workingset shadow memory implementation.
+        * If user pass  __GFP_ZERO by mistake, slab allocator will clear
+        * node->private_list, which makes a BUG. Rather than going Oops,
+        * just fix and warn about it.
+        */
+       if (WARN_ON(gfp_mask & __GFP_ZERO))
+               gfp_mask &= ~GFP_ZERO
        return __radix_tree_preload(gfp_mask, RADIX_TREE_PRELOAD_SIZE);
@@ -522,7 +532,7 @@ EXPORT_SYMBOL(radix_tree_preload);
 int radix_tree_maybe_preload(gfp_t gfp_mask)
-       if (gfpflags_allow_blocking(gfp_mask))
+       if (gfpflags_allow_blocking(gfp_mask) && !(gfp_mask & __GFP_ZERO))
                return __radix_tree_preload(gfp_mask, RADIX_TREE_PRELOAD_SIZE);
        /* Preloading doesn't help anything with this gfp mask, skip it */

