Hi Maíra, Am Montag, dem 02.06.2025 um 10:22 -0300 schrieb Maíra Canal: > Commit 704d3d60fec4 ("drm/etnaviv: don't block scheduler when GPU is still > active") ensured that active jobs are returned to the pending list when > extending the timeout. However, it didn't use the pending list's lock to > manipulate the list, which causes a race condition as the scheduler's > workqueues are running. > > Hold the lock while manipulating the scheduler's pending list to prevent > a race. > > Cc: sta...@vger.kernel.org > Fixes: 704d3d60fec4 ("drm/etnaviv: don't block scheduler when GPU is still > active") > Signed-off-by: Maíra Canal <mca...@igalia.com>
Reviewed-by: Lucas Stach <l.st...@pengutronix.de> Feel free to take this through drm-misc. This code is really ugly and I'm really happy to see your work on a better solution for extending the timeouts. Thanks for working on that! Regards, Lucas > --- > Hi, > > I'm proposing this workaround patch to address the race-condition caused > by manipulating the pending list without using its lock. Although I > understand this isn't a complete solution (see [1]), it's not reasonable > to backport the new DRM stat series [2] to the stable branches. > > Therefore, I believe the best solution is backporting this fix to the > stable branches, which will fix the race and will keep adding the job > back to the pending list (which will avoid most memory leaks). > > [1] > https://lore.kernel.org/dri-devel/bcc0ed477f8a6f3bb06665b1756bcb98fb7af871.ca...@mailbox.org/ > [2] > https://lore.kernel.org/dri-devel/20250530-sched-skip-reset-v2-0-c40a8d2d8...@igalia.com/ > > Best Regards, > - Maíra > --- > drivers/gpu/drm/etnaviv/etnaviv_sched.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c > b/drivers/gpu/drm/etnaviv/etnaviv_sched.c > index 76a3a3e517d8..71e2e6b9d713 100644 > --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c > +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c > @@ -35,6 +35,7 @@ static enum drm_gpu_sched_stat > etnaviv_sched_timedout_job(struct drm_sched_job > *sched_job) > { > struct etnaviv_gem_submit *submit = to_etnaviv_submit(sched_job); > + struct drm_gpu_scheduler *sched = sched_job->sched; > struct etnaviv_gpu *gpu = submit->gpu; > u32 dma_addr, primid = 0; > int change; > @@ -89,7 +90,9 @@ static enum drm_gpu_sched_stat > etnaviv_sched_timedout_job(struct drm_sched_job > return DRM_GPU_SCHED_STAT_NOMINAL; > > out_no_timeout: > - list_add(&sched_job->list, &sched_job->sched->pending_list); > + spin_lock(&sched->job_list_lock); > + list_add(&sched_job->list, &sched->pending_list); > + spin_unlock(&sched->job_list_lock); > return DRM_GPU_SCHED_STAT_NOMINAL; > } >