After Rob mentioned to me that Rocket could have a similar redundant job_lock just like Ethos [1], I decided to take a look at the driver to see if we could remove this lock. However, as I was reading the code, I identified that, more than the job_lock, the issue is a bit different. The job submission procedure in Rocket breaks the DRM scheduler's design in a fundamental way.
Currently, a job spawns further hardware work from outside the scheduler. The function rocket_job_run() submits only the first task of an inference; every subsequent task is submitted by the threaded IRQ handler, which calls rocket_job_hw_submit() directly. The scheduler expects all of a job's hardware submission to happen in run_job(). Submitting jobs from the IRQ handler instead is completely invisible to the scheduler, which can cause some issues, like: drm_sched_stop() only synchronizes the scheduler's workqueue, not the IRQ, so the reset path races these IRQ-driven submissions. This creates the need of a job_lock mutex and the reset.pending flag, which exist only as a workaround to that self-inflicted race. Considering the current status of the driver, solving this issue is quite simple: don't consider the whole submission as a DRM sched job, instead consider a task a DRM sched job. With that, the driver can comply to the DRM scheduler expectations and get rid of some locks, flags and indexes. Having said that, this is only "compile-tested", I don't have this hardware. I was just driven by Rob's comment to take a look at Rocket's code and the design looked unusual to what I would expect from a DRM scheduler-based driver. I'm also CCing some scheduler maintainers to check if they agree that the IRQ handler shouldn't spawn further HW work. Apart from that, this series also has some clean-up patches. [1] https://lore.kernel.org/dri-devel/[email protected]/T/ Best regards, - Maíra --- Maíra Canal (3): drm/rocket: Remove unused reset worker drm/rocket: Submit one drm_sched_job per task drm/rocket: Drop the dedicated reset workqueue drivers/accel/rocket/rocket_core.h | 10 +- drivers/accel/rocket/rocket_job.c | 282 ++++++++++++++++++------------------- drivers/accel/rocket/rocket_job.h | 26 +++- 3 files changed, 159 insertions(+), 159 deletions(-) --- base-commit: 640c57d6ca1346a1c2363a3f473b405af979e046 change-id: 20260605-rocket-per-task-jobs-b797f7e2b1e9
