Thank you for your contribution! Sashiko AI review found 4 potential issue(s) 
to consider:

New issues:
- [High] Unhandled interrupt storm during device removal due to 
`pm_runtime_get_if_active()`.
- [High] NULL pointer dereference in the IRQ thread during teardown 
(Use-After-Free).

Pre-existing issues:
- [Critical] Double free of `rjob->tasks` array during job submission error 
handling.
- [High] NULL pointer dereference when freeing an uninitialized `job->domain`.
--

commit 7c59a4d5e2e74264bcfe7bc8aa7c903516b1c1c0
Author: ZhaoJinming <[email protected]>

accel/rocket: Fix iommu_group leak and unsafe IRQ register access

This commit replaces iommu_group_get() with the cached core->iommu_group
to prevent a reference leak. It also adds pm_runtime_get_if_active() checks
in the IRQ handler and thread to prevent unsafe hardware register accesses
when the device is suspended.

> diff --git a/drivers/accel/rocket/rocket_job.c 
> b/drivers/accel/rocket/rocket_job.c
> index cf9f61682dd8b..21b433826b479 100644
> --- a/drivers/accel/rocket/rocket_job.c
> +++ b/drivers/accel/rocket/rocket_job.c
> @@ -349,7 +349,7 @@ static void rocket_job_handle_irq(struct rocket_core 
> *core)
>                               return;
>                       }
>  
> -                     iommu_detach_group(NULL, iommu_group_get(core->dev));
> +                     iommu_detach_group(NULL, core->iommu_group);

[Severity: High]
Could this lead to a NULL pointer dereference during device teardown?

In rocket_core_fini(), core->iommu_group is set to NULL without acquiring
core->job_lock or waiting for the IRQ thread to finish:

rocket_core_fini()
    ...
    core->iommu_group = NULL;

If the threaded IRQ handler runs concurrently after passing the runtime PM
check, it will pass this newly nulled pointer to iommu_detach_group(),
causing the IOMMU core to dereference a NULL pointer when acquiring the
group's mutex.

>                       dma_fence_signal(core->in_flight_job->done_fence);
>                       pm_runtime_put_autosuspend(core->dev);
>                       core->in_flight_job = NULL;

[ ... ]

> @@ -428,16 +431,24 @@ static irqreturn_t rocket_job_irq_handler_thread(int 
> irq, void *data)
>  static irqreturn_t rocket_job_irq_handler(int irq, void *data)
>  {
>       struct rocket_core *core = data;
> -     u32 raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS);
> +     u32 raw_status;
> +
> +     if (pm_runtime_get_if_active(core->dev) != 1)
> +             return IRQ_NONE;

[Severity: High]
Will this cause an unhandled interrupt storm during module removal?

During module removal, rocket_core_fini() disables runtime PM:

rocket_core_fini()
    pm_runtime_disable(core->dev);

This happens before the hardware is stopped. If a hardware job completes
subsequently, the device raises a level-triggered interrupt. Since runtime
PM is disabled, pm_runtime_get_if_active() returns -EINVAL.

The handler then unconditionally returns IRQ_NONE without clearing the
interrupt. Because the IRQ is level-triggered, the unacknowledged interrupt
will immediately refire upon handler exit, potentially causing an infinite
interrupt storm that deadlocks the CPU.

[Severity: Critical]
This isn't a bug introduced by this patch, but does the error handling in
rocket_copy_tasks() cause a double free of the tasks array?

If validation of the incoming tasks fails, the array is freed but the
pointer is not cleared:

rocket_copy_tasks() {
    ...
fail:
    kvfree(rjob->tasks);
    return ret;
}

The caller then jumps to the error label and invokes rocket_job_put(),
which triggers rocket_job_cleanup(). This cleanup function frees the
dangling pointer again:

rocket_job_cleanup() {
    ...
    kvfree(job->tasks);
}

Can a local unprivileged attacker exploit this via the submit ioctl to
corrupt kernel memory?

[Severity: High]
This isn't a bug introduced by this patch, but is there a NULL pointer
dereference when freeing an uninitialized domain?

When the job structure is allocated, all fields are initialized to zero.
If setup fails early (for example, in drm_sched_job_init() or
rocket_copy_tasks()), job->domain remains NULL.

The error handling path invokes rocket_job_cleanup(), which unconditionally
releases the domain:

rocket_job_cleanup() {
    ...
    rocket_iommu_domain_put(job->domain);
}

This function immediately dereferences the pointer:

rocket_iommu_domain_put() {
    ...
    kref_put(&domain->kref, rocket_iommu_domain_destroy);
}

Could this cause a kernel panic if an unprivileged user passes invalid
arguments?

-- 
Sashiko AI review · 
https://sashiko.dev/#/patchset/[email protected]?part=2

Reply via email to