On Wed, Mar 04, 2026 at 02:39:47PM +0100, Vlastimil Babka (SUSE) wrote:
> On 3/4/26 2:30 AM, Harry Yoo wrote:
> > [+Cc adding Catalin for kmemleak bits]
> > On Mon, Mar 02, 2026 at 09:39:48AM +0100, Vlastimil Babka (SUSE) wrote:
> >> On 3/2/26 04:41, Qing Wang wrote:
> >>> #syz test
> >>>
> >>> diff --git a/mm/slub.c b/mm/slub.c
> >>> index cdc1e652ec52..387979b89120 100644
> >>> --- a/mm/slub.c
> >>> +++ b/mm/slub.c
> >>> @@ -6307,15 +6307,21 @@ bool __kfree_rcu_sheaf(struct kmem_cache *s, void 
> >>> *obj)
> >>>                   goto fail;
> >>>  
> >>>           if (!local_trylock(&s->cpu_sheaves->lock)) {
> >>> -                 barn_put_empty_sheaf(barn, empty);
> >>> +                 if (barn && data_race(barn->nr_empty) < 
> >>> MAX_EMPTY_SHEAVES)
> >>> +                         barn_put_empty_sheaf(barn, empty);
> >>> +                 else
> >>> +                         free_empty_sheaf(s, empty);
> >>>                   goto fail;
> >>>           }
> >>>  
> >>>           pcs = this_cpu_ptr(s->cpu_sheaves);
> >>>  
> >>> -         if (unlikely(pcs->rcu_free))
> >>> -                 barn_put_empty_sheaf(barn, empty);
> >>> -         else
> >>> +         if (unlikely(pcs->rcu_free)) {
> >>> +                 if (barn && data_race(barn->nr_empty) < 
> >>> MAX_EMPTY_SHEAVES)
> >>> +                         barn_put_empty_sheaf(barn, empty);
> >>> +                 else
> >>> +                         free_empty_sheaf(s, empty);
> >>> +         } else
> >>>                   pcs->rcu_free = empty;
> >>>   }
> >>
> >> I don't think this would fix any leak, and syzbot agrees. It would limit 
> >> the
> >> empty sheaves in barn more strictly, but they are not leaked.
> >> Hm I don't see any leak in __kfree_rcu_sheaf() or rcu_free_sheaf(). Wonder
> >> if kmemleak lacks visibility into barns or pcs's as roots for searching 
> >> what
> >> objects are considered referenced, or something?
> > 
> > Objects that are allocated from slab and percpu allocator should be
> > properly tracked by kmemleak. But those allocated with
> > gfpflags_allow_spinning() == false are not tracked by kmemleak.
> > 
> > When barns and sheaves are allocated early (!gfpflags_allow_spinning()
> > due to gfp_allowed_mask) and it skips kmemleak_alloc_recursive(), 
> > it could produce false positives because from kmemleak's point of view,
> > the objects are not reachable from the root set (data section, stack,
> > etc.).
> 
> Good point.
> 
> > To me it seems kmemleak should gain allow_spin == false support
> > sooner or later.
> 
> Or we figure out how to deal with the false allow_spin == false during
> boot. Here I'm a bit confused how exactly it happens because AFAICS in
> slub we apply gfp_allowed_mask only when allocating a new slab, and in
> slab_post_alloc_hook() we apply it to init_mask. That is indeed passed
> to kmemleak_alloc_recursive() but not used for the
> gfpflags_allow_spinning() decision. kmemleak_alloc_recursive() should
> succeed because nobody should be holding any locks that would require
> spinning.

I don't fully understand what goes on. If kmemleak_alloc_recursive()
failed to allocate for some reason (other than SLAB_NOLEAKTRACE), it
would loudly disable kmemleak altogether and stop reporting leaks. Also
kmemleak doesn't care about allow_spin, it's only the slub code which
avoids calling kmemleak if spinning not allowed (as it takes some locks,
may call back into the slab allocator).

I wonder whether some early kmem_cache_node allocations like the ones in
early_kmem_cache_node_alloc() are not tracked and then kmemleak cannot
find n->barn. I got lost in the slub code, but something like this:

-----------8<-----------------------------------
diff --git a/mm/slub.c b/mm/slub.c
index 0c906fefc31b..401557ff5487 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -7513,6 +7513,7 @@ static void early_kmem_cache_node_alloc(int node)
        slab->freelist = get_freepointer(kmem_cache_node, n);
        slab->inuse = 1;
        kmem_cache_node->node[node] = n;
+       kmemleak_alloc(n, sizeof(*n), 1, GFP_NOWAIT);
        init_kmem_cache_node(n, NULL);
        inc_slabs_node(kmem_cache_node, node, slab->objects);
 
-------------8<----------------------------------------

Another thing I noticed, not sure it's related but we should probably
ignore an object once it has been passed to kvfree_call_rcu(), similar
to what we do on the main path in this function. Also see commit
5f98fd034ca6 ("rcu: kmemleak: Ignore kmemleak false positives when
RCU-freeing objects") when we added this kmemleak_ignore().

---------8<-----------------------------------
diff --git a/mm/slab_common.c b/mm/slab_common.c
index d5a70a831a2a..73f4668d870d 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1954,8 +1954,14 @@ void kvfree_call_rcu(struct rcu_head *head, void *ptr)
        if (!head)
                might_sleep();
 
-       if (!IS_ENABLED(CONFIG_PREEMPT_RT) && kfree_rcu_sheaf(ptr))
+       if (!IS_ENABLED(CONFIG_PREEMPT_RT) && kfree_rcu_sheaf(ptr)) {
+               /*
+                * The object is now queued for deferred freeing via an RCU
+                * sheaf. Tell kmemleak to ignore it.
+                */
+               kmemleak_ignore(ptr);
                return;
+       }
 
        // Queue the object but don't yet schedule the batch.
        if (debug_rcu_head_queue(ptr)) {
----------------8<-----------------------------------

-- 
Catalin


_______________________________________________
Linux-f2fs-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/linux-f2fs-devel

Reply via email to