On Wednesday, 26 November 2025 18:28:55 CET Krzysztof Niemiec wrote:
> On 2025-11-25 at 19:06:32 GMT, Janusz Krzysztofik wrote:
> > Re-sending because of my response unintentionally HTML formatted, with
> > correct
> > email address of Tvrtko by the way.
> >
> >
> > Hi Krzysztof,
> >
> > On Tuesday, 25 November 2025 14:33:38 CET Krzysztof Niemiec wrote:
> > > Initialize eb->vma[].vma pointers to NULL when the eb structure is first
> > > set up.
> > >
> > > During the execution of eb_lookup_vmas(), the eb->vma array is
> > > successively filled up with struct eb_vma objects. This process includes
> > > calling eb_add_vma(), which might fail; however, even in the event of
> > > failure, eb->vma[i].vma is set for the currently processed buffer.
> > >
> > > If eb_add_vma() fails, eb_lookup_vmas() returns with an error, which
> > > prompts a call to eb_release_vmas() to clean up the mess. Since
> > > eb_lookup_vmas() might fail during processing any (possibly not first)
> > > buffer, eb_release_vmas() checks whether a buffer's vma is NULL to know
> > > at what point did the lookup function fail.
> > >
> > > In eb_lookup_vmas(), eb->vma[i].vma is set to NULL if either the helper
> > > function eb_lookup_vma() or eb_validate_vma() fails. eb->vma[i+1].vma is
> > > set to NULL in case i915_gem_object_userptr_submit_init() fails; the
> > > current one needs to be cleaned up by eb_release_vmas() at this point,
> > > so the next one is set. If eb_add_vma() fails, neither the current nor
> > > the next vma is nullified, which is a source of a NULL deref bug
> > > described in [1].
> > >
> > > When entering eb_lookup_vmas(), the vma pointers are set to the slab
> > > poison value, instead of NULL.
> >
> >
> > Your commit description still doesn't answer my question why the whole
> > memory
> > area allocated to the table of VMAs is not initialized to 0 on allocation,
> > only left populated with poison values.
> >
>
> Becuase kvmalloc_array() is used. [1]
>
> I guess one could swap it to a call to kvcalloc() or something similar;
> the thing is that the call actually handles both allocations of
> exec_list2 and the eb_vma array, the former doesn't need to be
> zero-initialized, the latter technically also doesn't but it simplifies
> error paths (and fixes the linked bug). I'm not sure if a
> zero-initializing *alloc() would be more readable or not here.
To my taste, zeroing on allocation would be a more clean solution.
But, while being at it, please have a still closer look, especially at these
two statements:
at
drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:i915_gem_execbuffer2_ioctl():3588
/* Allocate extra slots for use by the command parser */
exec2_list = kvmalloc_array(count + 2, eb_element_size(),
__GFP_NOWARN | GFP_KERNEL);
at drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c:i915_gem_do_execbuffer():3354
eb.vma = (struct eb_vma *)(exec + args->buffer_count + 1);
Why do we allocate space for 2 tables of count size plus 2 extra pairs of
their elements, but then place the second table at count + 1 offset, leaving
space for only one extra element of the first type?
Looking at git history, there was a couple of excessively complex patches and
reverts that apparently introduced that discrepancy. Unfortunately, none of
them, with exception of the one that introduced the above shown inline
comment, provided a clear justification why we need to switch from 1 to 2 or
vice versa.
Anyway, depending on how that extra space is actually used by the command
parser, we may or may not get into troubles with that, so we should better fix
it, I believe.
Thanks,
Janusz
>
> Thanks
> Krzysztof
>
> [1]
> https://elixir.bootlin.com/linux/v6.17.9/source/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c#L3586
>
> > Thanks,
> > Janusz
>