Thank you for your contribution! Sashiko AI review found 1 potential issue(s) 
to consider:

Pre-existing issues:
- [High] Aborting a DRM scheduler job after calling drm_sched_job_arm() 
corrupts scheduler state.
--

commit e1b7e4281cdb5c3f1d9985be73947531edf81dc5
Author: ZhaoJinming <[email protected]>

accel/rocket: Fix error path handling in rocket_job_run()

This commit replaces pm_runtime_get_sync() with pm_runtime_resume_and_get()
in rocket_job_run() to correctly balance the usage counter on failure. It
also fixes dma_fence reference leaks and prevents returning an unsignaled
fence on error paths.

> diff --git a/drivers/accel/rocket/rocket_job.c 
> b/drivers/accel/rocket/rocket_job.c
> index 2f1861f960ccb..cf9f61682dd8b 100644
> --- a/drivers/accel/rocket/rocket_job.c
> +++ b/drivers/accel/rocket/rocket_job.c

[ ... ]

>       scoped_guard(mutex, &rdev->sched_lock) {
>               drm_sched_job_arm(&job->base);
> 
>               job->inference_done_fence = 
> dma_fence_get(&job->base.s_fence->finished);
> 
>               ret = rocket_acquire_object_fences(job->in_bos, 
> job->in_bo_count, &job->base, false);
>               if (ret)
>                       goto err_unlock;

[Severity: High]
This isn't a bug introduced by this patch, but does this sequence corrupt the
scheduler state?

In rocket_job_push(), if rocket_acquire_object_fences() fails, the job
setup is aborted via the goto err_unlock path where the caller invokes
drm_sched_job_cleanup().

Because drm_sched_job_arm() allocates sequence numbers and initializes
fences, aborting the job after it has been armed leaves gaps in the
sequence numbers. Can this lead to system hangs where waiters may wait
indefinitely for a missing sequence number?

Should fence acquisition be performed before arming the job?

-- 
Sashiko AI review · 
https://sashiko.dev/#/patchset/[email protected]?part=1

Reply via email to