On 05/06/26 13:06, Maíra Canal wrote:
After Rob mentioned to me that Rocket could have a similar redundant
job_lock just like Ethos [1], I decided to take a look at the driver to
see if we could remove this lock. However, as I was reading the code, I
identified that, more than the job_lock, the issue is a bit different. The
job submission procedure in Rocket breaks the DRM scheduler's design in a
fundamental way.
Currently, a job spawns further hardware work from outside the scheduler.
The function rocket_job_run() submits only the first task of an inference;
every subsequent task is submitted by the threaded IRQ handler, which calls
rocket_job_hw_submit() directly.
The scheduler expects all of a job's hardware submission to happen in
run_job(). Submitting jobs from the IRQ handler instead is completely
invisible to the scheduler, which can cause some issues, like:
drm_sched_stop() only synchronizes the scheduler's workqueue, not the IRQ,
so the reset path races these IRQ-driven submissions. This creates the need
of a job_lock mutex and the reset.pending flag, which exist only as a
workaround to that self-inflicted race.
Considering the current status of the driver, solving this issue is quite
simple: don't consider the whole submission as a DRM sched job, instead
consider a task a DRM sched job. With that, the driver can comply to the
DRM scheduler expectations and get rid of some locks, flags and indexes.
Having said that, this is only "compile-tested", I don't have this
hardware. I was just driven by Rob's comment to take a look at Rocket's
code and the design looked unusual to what I would expect from a DRM
scheduler-based driver. I'm also CCing some scheduler maintainers to check
if they agree that the IRQ handler shouldn't spawn further HW work.
Apart from that, this series also has some clean-up patches.
[1]
https://lore.kernel.org/dri-devel/[email protected]/T/
Sorry, I forgot the RFC tag.
Best regards,
- Maíra
Best regards,
- Maíra
---
Maíra Canal (3):
drm/rocket: Remove unused reset worker
drm/rocket: Submit one drm_sched_job per task
drm/rocket: Drop the dedicated reset workqueue
drivers/accel/rocket/rocket_core.h | 10 +-
drivers/accel/rocket/rocket_job.c | 282 ++++++++++++++++++-------------------
drivers/accel/rocket/rocket_job.h | 26 +++-
3 files changed, 159 insertions(+), 159 deletions(-)
---
base-commit: 640c57d6ca1346a1c2363a3f473b405af979e046
change-id: 20260605-rocket-per-task-jobs-b797f7e2b1e9