On Mon, May 18, 2026 at 04:55:12PM +0100, Matthew Auld wrote:
> On 18/05/2026 15:14, Francois Dugast wrote:
> > When split_block() fails it returns before calling mark_split(), leaving
> > the block in the FREE state and still linked in the rbtree.  The four
> > err_undo paths then call __gpu_buddy_free() without first removing the
> > block from the tree, which leads to two distinct bugs:
> > 
> >   - If the buddy is also free, __gpu_buddy_free() merges the two siblings
> >     by calling gpu_block_free(mm, block) while block->rb is still linked
> >     in the tree.  Any subsequent rbtree traversal will follow the now-
> >     dangling pointer, causing a use-after-free.
> > 
> >   - In alloc_from_freetree(), where there is no buddy guard,
> >     __gpu_buddy_free() always reaches mark_free() -> rbtree_insert() with
> >     block still in the tree, corrupting the rbtree.
> > 
> > The same pattern is already used correctly in __force_merge(): call
> > rbtree_remove() to unlink the block before handing it to
> > __gpu_buddy_free().  Apply the same fix to all four err_undo sites.
> > 
> > Reported-by: Sashiko <[email protected]>
> > Signed-off-by: Francois Dugast <[email protected]>
> > Assisted-by: GitHub Copilot:claude-sonnet-4.6
> > ---
> >   drivers/gpu/buddy.c | 16 ++++++++++++----
> >   1 file changed, 12 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/gpu/buddy.c b/drivers/gpu/buddy.c
> > index eb1457376307..dac2027bb64a 100644
> > --- a/drivers/gpu/buddy.c
> > +++ b/drivers/gpu/buddy.c
> > @@ -737,8 +737,10 @@ __alloc_range_bias(struct gpu_buddy *mm,
> >     buddy = __get_buddy(block);
> >     if (buddy &&
> >         (gpu_buddy_block_is_free(block) &&
> > -        gpu_buddy_block_is_free(buddy)))
> > +        gpu_buddy_block_is_free(buddy))) {
> > +           rbtree_remove(mm, block);
> >             __gpu_buddy_free(mm, block, false);
> > +   }
> >     return ERR_PTR(err);
> >   }
> > @@ -847,8 +849,10 @@ alloc_from_freetree(struct gpu_buddy *mm,
> >     return block;
> >   err_undo:
> > -   if (tmp != order)
> > +   if (tmp != order) {
> > +           rbtree_remove(mm, block);
> 
> Actually, I think this needs the same checking like elsewhere? Say we fail
> on the first split? Nothing was actually split, right?

I think this is unnecessary: for block this is tested above with
BUG_ON(!gpu_buddy_block_is_free(block)). If split_block() fails then it
happens before mark_split() so block remains free. If buddy is not free
then the merge loop is skipped in __gpu_buddy_free() but mark_free() is
called so we do remove + re-insert.

Also, the checks are added with patch #3 and the introduction of
__gpu_buddy_undo_splits().

Francois

> 
> >             __gpu_buddy_free(mm, block, false);
> > +   }
> >     return ERR_PTR(err);
> >   }
> > @@ -968,8 +972,10 @@ gpu_buddy_offset_aligned_allocation(struct gpu_buddy 
> > *mm,
> >     buddy = __get_buddy(block);
> >     if (buddy &&
> >         (gpu_buddy_block_is_free(block) &&
> > -        gpu_buddy_block_is_free(buddy)))
> > +        gpu_buddy_block_is_free(buddy))) {
> > +           rbtree_remove(mm, block);
> >             __gpu_buddy_free(mm, block, false);
> > +   }
> >     return ERR_PTR(err);
> >   }
> > @@ -1054,8 +1060,10 @@ static int __alloc_range(struct gpu_buddy *mm,
> >     buddy = __get_buddy(block);
> >     if (buddy &&
> >         (gpu_buddy_block_is_free(block) &&
> > -        gpu_buddy_block_is_free(buddy)))
> > +        gpu_buddy_block_is_free(buddy))) {
> > +           rbtree_remove(mm, block);
> >             __gpu_buddy_free(mm, block, false);
> > +   }
> >   err_free:
> >     if (err == -ENOSPC && total_allocated_on_err) {
> 

Reply via email to