On Mon, May 18, 2026 at 04:55:12PM +0100, Matthew Auld wrote:
> On 18/05/2026 15:14, Francois Dugast wrote:
> > When split_block() fails it returns before calling mark_split(), leaving
> > the block in the FREE state and still linked in the rbtree. The four
> > err_undo paths then call __gpu_buddy_free() without first removing the
> > block from the tree, which leads to two distinct bugs:
> >
> > - If the buddy is also free, __gpu_buddy_free() merges the two siblings
> > by calling gpu_block_free(mm, block) while block->rb is still linked
> > in the tree. Any subsequent rbtree traversal will follow the now-
> > dangling pointer, causing a use-after-free.
> >
> > - In alloc_from_freetree(), where there is no buddy guard,
> > __gpu_buddy_free() always reaches mark_free() -> rbtree_insert() with
> > block still in the tree, corrupting the rbtree.
> >
> > The same pattern is already used correctly in __force_merge(): call
> > rbtree_remove() to unlink the block before handing it to
> > __gpu_buddy_free(). Apply the same fix to all four err_undo sites.
> >
> > Reported-by: Sashiko <[email protected]>
> > Signed-off-by: Francois Dugast <[email protected]>
> > Assisted-by: GitHub Copilot:claude-sonnet-4.6
> > ---
> > drivers/gpu/buddy.c | 16 ++++++++++++----
> > 1 file changed, 12 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/gpu/buddy.c b/drivers/gpu/buddy.c
> > index eb1457376307..dac2027bb64a 100644
> > --- a/drivers/gpu/buddy.c
> > +++ b/drivers/gpu/buddy.c
> > @@ -737,8 +737,10 @@ __alloc_range_bias(struct gpu_buddy *mm,
> > buddy = __get_buddy(block);
> > if (buddy &&
> > (gpu_buddy_block_is_free(block) &&
> > - gpu_buddy_block_is_free(buddy)))
> > + gpu_buddy_block_is_free(buddy))) {
> > + rbtree_remove(mm, block);
> > __gpu_buddy_free(mm, block, false);
> > + }
> > return ERR_PTR(err);
> > }
> > @@ -847,8 +849,10 @@ alloc_from_freetree(struct gpu_buddy *mm,
> > return block;
> > err_undo:
> > - if (tmp != order)
> > + if (tmp != order) {
> > + rbtree_remove(mm, block);
>
> Actually, I think this needs the same checking like elsewhere? Say we fail
> on the first split? Nothing was actually split, right?
I think this is unnecessary: for block this is tested above with
BUG_ON(!gpu_buddy_block_is_free(block)). If split_block() fails then it
happens before mark_split() so block remains free. If buddy is not free
then the merge loop is skipped in __gpu_buddy_free() but mark_free() is
called so we do remove + re-insert.
Also, the checks are added with patch #3 and the introduction of
__gpu_buddy_undo_splits().
Francois
>
> > __gpu_buddy_free(mm, block, false);
> > + }
> > return ERR_PTR(err);
> > }
> > @@ -968,8 +972,10 @@ gpu_buddy_offset_aligned_allocation(struct gpu_buddy
> > *mm,
> > buddy = __get_buddy(block);
> > if (buddy &&
> > (gpu_buddy_block_is_free(block) &&
> > - gpu_buddy_block_is_free(buddy)))
> > + gpu_buddy_block_is_free(buddy))) {
> > + rbtree_remove(mm, block);
> > __gpu_buddy_free(mm, block, false);
> > + }
> > return ERR_PTR(err);
> > }
> > @@ -1054,8 +1060,10 @@ static int __alloc_range(struct gpu_buddy *mm,
> > buddy = __get_buddy(block);
> > if (buddy &&
> > (gpu_buddy_block_is_free(block) &&
> > - gpu_buddy_block_is_free(buddy)))
> > + gpu_buddy_block_is_free(buddy))) {
> > + rbtree_remove(mm, block);
> > __gpu_buddy_free(mm, block, false);
> > + }
> > err_free:
> > if (err == -ENOSPC && total_allocated_on_err) {
>