sched: optimize drm_sched_job_add_dependency

Simona Vetter Wed, 28 May 2025 05:30:32 -0700

On Mon, May 26, 2025 at 01:27:28PM +0200, Philipp Stanner wrote:
> On Mon, 2025-05-26 at 13:16 +0200, Christian König wrote:
> > On 5/26/25 11:34, Philipp Stanner wrote:
> > > On Mon, 2025-05-26 at 11:25 +0200, Christian König wrote:
> > > > On 5/23/25 16:16, Danilo Krummrich wrote:
> > > > > On Fri, May 23, 2025 at 04:11:39PM +0200, Danilo Krummrich
> > > > > wrote:
> > > > > > On Fri, May 23, 2025 at 02:56:40PM +0200, Christian König
> > > > > > wrote:
> > > > > > > It turned out that we can actually massively optimize here.
> > > > > > > 
> > > > > > > The previous code was horrible inefficient since it
> > > > > > > constantly
> > > > > > > released
> > > > > > > and re-acquired the lock of the xarray and started each
> > > > > > > iteration from the
> > > > > > > base of the array to avoid concurrent modification which in
> > > > > > > our
> > > > > > > case
> > > > > > > doesn't exist.
> > > > > > > 
> > > > > > > Additional to that the xas_find() and xas_store() functions
> > > > > > > are
> > > > > > > explicitly
> > > > > > > made in a way so that you can efficiently check entries and
> > > > > > > if
> > > > > > > you don't
> > > > > > > find a match store a new one at the end or replace existing
> > > > > > > ones.
> > > > > > > 
> > > > > > > So use xas_for_each()/xa_store() instead of
> > > > > > > xa_for_each()/xa_alloc().
> > > > > > > It's a bit more code, but should be much faster in the end.
> > > > > > 
> > > > > > This commit message does neither explain the motivation of
> > > > > > the
> > > > > > commit nor what it
> > > > > > does. It describes what instead belongs into the changelog
> > > > > > between versions.
> > > > > 
> > > > > Sorry, this is wrong. I got confused, the commit message is
> > > > > perfectly fine. :)
> > > > > 
> > > > > The rest still applies though.
> > > > > 
> > > > > > Speaking of versioning of the patch series, AFAIK there were
> > > > > > previous versions,
> > > > > > but this series was sent as a whole new series -- why?
> > > > > > 
> > > > > > Please resend with a proper commit message, version and
> > > > > > changelog. Thanks!
> > > > 
> > > > 
> > > > Well Philip asked to remove the changelog. I'm happy to bring it
> > > > back, but yeah...
> > > 
> > > No no no no :D
> > > 
> > > Philipp asked for the changelog to be removed *from the git commit
> > > message*; because it doesn't belong / isn't useful there.
> > > 
> > > If there's a cover letter, the changelog should be in the cover
> > > letter.
> > > If there's no cover letter, it should be between the ---
> > > separators:
> > 
> > I can live with that, just clearly state what you want.
> 
> Sure thing:
> 
>  * Patches and patch series's should contain their version identifier
>    within the square brackets [PATCH v3]. git format-patch -v3 does
>    that automatically.
>  * Changelog should be as described above
>  * Ideally, cover letters always contain the full changelog, v2, v3 and
>    so on, so that new readers get a sense of the evolution of the
>    series.
> 
> > 
> > For DRM the ask is often to keep the changelog in the commit message
> > or remove it entirely.
> 
> Yup, I've seen that a few times. I think we, the DRM community, should
> stop that. It's just not useful and makes the commit messages larger,
> both for the human reader while scrolling, as for the hard drive
> regarding storage size


I do occasionally find it useful as a record of different approaches
considered, which sometimes people fail to adequately cover in their
commit messages. Also useful indicator of how cursed a patch is :-)

But as long as anything relevant does end up in the commit message and
people don't just delete stuff I don't care how it's done at all. It's
just that the cost of deleting something that should have been there can
be really nasty sometimes, and storage is cheap.
-Sima

> 
> 
> Thx
> P.
> 
> 
> > 
> > Regards,
> > Christian.
> > 
> > > 
> > > 
> > > Signed-off-by: Gordon Freeman <free...@blackmesa.org>
> > > Reviewed-by: Alyx Vance <a...@vance.edu>
> > > ---
> > > Changes in v2:
> > >   - Provide more docu for crowbar-alloc-function.
> > >   - Use NULL pointers for reserved xarray entries
> > > ---
> > > <DIFF>
> > > 
> > > 
> > > P.
> > > 
> > > 
> > > > 
> > > > Regards,
> > > > Christian.
> > > > 
> > > > > > 
> > > > > > > Signed-off-by: Christian König <christian.koe...@amd.com>
> > > > > > > ---
> > > > > > >  drivers/gpu/drm/scheduler/sched_main.c | 29
> > > > > > > ++++++++++++++++++--------
> > > > > > >  1 file changed, 20 insertions(+), 9 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > > b/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > > index f7118497e47a..cf200b1b643e 100644
> > > > > > > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > > > > > > @@ -871,10 +871,8 @@ EXPORT_SYMBOL(drm_sched_job_arm);
> > > > > > >  int drm_sched_job_add_dependency(struct drm_sched_job
> > > > > > > *job,
> > > > > > >   struct dma_fence *fence)
> > > > > > >  {
> > > > > > > + XA_STATE(xas, &job->dependencies, 0);
> > > > > > >   struct dma_fence *entry;
> > > > > > > - unsigned long index;
> > > > > > > - u32 id = 0;
> > > > > > > - int ret;
> > > > > > >  
> > > > > > >   if (!fence)
> > > > > > >   return 0;
> > > > > > > @@ -883,24 +881,37 @@ int
> > > > > > > drm_sched_job_add_dependency(struct
> > > > > > > drm_sched_job *job,
> > > > > > >   * This lets the size of the array of deps scale with
> > > > > > > the number of
> > > > > > >   * engines involved, rather than the number of BOs.
> > > > > > >   */
> > > > > > > - xa_for_each(&job->dependencies, index, entry) {
> > > > > > > + xa_lock(&job->dependencies);
> > > > > > > + xas_for_each(&xas, entry, ULONG_MAX) {
> > > > > > >   if (entry->context != fence->context)
> > > > > > >   continue;
> > > > > > >  
> > > > > > >   if (dma_fence_is_later(fence, entry)) {
> > > > > > >   dma_fence_put(entry);
> > > > > > > - xa_store(&job->dependencies, index,
> > > > > > > fence, GFP_KERNEL);
> > > > > > > + xas_store(&xas, fence);
> > > > > > >   } else {
> > > > > > >   dma_fence_put(fence);
> > > > > > >   }
> > > > > > > - return 0;
> > > > > > > + xa_unlock(&job->dependencies);
> > > > > > > + return xas_error(&xas);
> > > > > > >   }
> > > > > > >  
> > > > > > > - ret = xa_alloc(&job->dependencies, &id, fence,
> > > > > > > xa_limit_32b, GFP_KERNEL);
> > > > > > > - if (ret != 0)
> > > > > > > +retry:
> > > > > > > + entry = xas_store(&xas, fence);
> > > > > > > + xa_unlock(&job->dependencies);
> > > > > > > +
> > > > > > > + /* There shouldn't be any concurrent add, so no need
> > > > > > > to loop again */
> > > > > > 
> > > > > > Concurrency shouldn't matter, xas_nomem() stores the pre-
> > > > > > allocated memory in the
> > > > > > XA_STATE not the xarray. Hence, I think we should remove the
> > > > > > comment.
> > > > > > 
> > > > > > > + if (xas_nomem(&xas, GFP_KERNEL)) {
> > > > > > > + xa_lock(&job->dependencies);
> > > > > > > + goto retry;
> > > > > > 
> > > > > > Please don't use a goto here, if we would have failed to
> > > > > > allocate
> > > > > > memory here,
> > > > > > this would be an endless loop until we succeed eventually. It
> > > > > > would be equal to:
> > > > > > 
> > > > > > while (!ptr) {
> > > > > > ptr = kmalloc();
> > > > > > }
> > > > > > 
> > > > > > Instead just take the lock and call xas_store() again.
> > > > > > 
> > > > > > > + }
> > > > > > > +
> > > > > > > + if (xas_error(&xas))
> > > > > > >   dma_fence_put(fence);
> > > > > > > + else
> > > > > > > + WARN_ON(entry);
> > > > > > 
> > > > > > Please don't call WARN_ON() here, this isn't fatal, we only
> > > > > > need
> > > > > > to return the
> > > > > > error code.
> > > > 
> > > 
> > 
> 

-- 
Simona Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Re: [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency

Reply via email to