>Thank you for your contribution! Sashiko AI review found 4 potential issue(s)
>to consider:
>
>New issues:
>- [High] Unhandled interrupt storm during device removal due to
>`pm_runtime_get_if_active()`.
I try to fix it in next version of patches.
>- [High] NULL pointer dereference in the IRQ thread during teardown
>(Use-After-Free).
Regarding the NULL pointer dereference during teardown concern:
This cannot occur in the current patch. The threaded IRQ handler has three
layers of protection:
pm_runtime_get_if_active() == 1 (line 423): In rocket_core_fini(),
pm_runtime_disable() is called before core->iommu_group = NULL. After
pm_runtime_disable(), disable_depth > 0, so pm_runtime_get_if_active() returns
-EINVAL. Since -EINVAL != 1 is false, the threaded handler never calls
rocket_job_handle_irq() and thus never accesses core->iommu_group. This holds
even if the thread is woken by coalescing after pm_runtime_disable().
scoped_guard(mutex, &core->job_lock) with in_flight_job check (line 345): Even
if the first check were somehow bypassed, rocket_job_handle_irq() only accesses
core->iommu_group inside a scoped_guard that checks core->in_flight_job !=
NULL. By the time rocket_core_fini() runs, drm_sched_fini() has already been
called (via rocket_job_fini()), which stops the scheduler and releases all
jobs, leaving in_flight_job as NULL.
Execution ordering guarantee: rocket_core_fini() calls pm_runtime_disable()
before setting iommu_group = NULL. The pm_runtime_get_if_active() check in the
thread handler is evaluated under spin_lock_irqsave (see
pm_runtime_get_conditional() at runtime.c:1233), which provides the
happens-before guarantee with respect to disable_depth modification in
__pm_runtime_disable().
No code path exists where the threaded handler can reach
iommu_detach_group(NULL, core->iommu_group) after core->iommu_group has been
set to NULL.
>Pre-existing issues:
>- [Critical] Double free of `rjob->tasks` array during job submission error
>handling.
>- [High] NULL pointer dereference when freeing an uninitialized `job->domain`.
>--
>
>commit 7c59a4d5e2e74264bcfe7bc8aa7c903516b1c1c0
>Author: ZhaoJinming <[email protected]>
>
>accel/rocket: Fix iommu_group leak and unsafe IRQ register access
>
>This commit replaces iommu_group_get() with the cached core->iommu_group
>to prevent a reference leak. It also adds pm_runtime_get_if_active() checks
>in the IRQ handler and thread to prevent unsafe hardware register accesses
>when the device is suspended.
>
>> diff --git a/drivers/accel/rocket/rocket_job.c
>> b/drivers/accel/rocket/rocket_job.c
>> index cf9f61682dd8b..21b433826b479 100644
>> --- a/drivers/accel/rocket/rocket_job.c
>> +++ b/drivers/accel/rocket/rocket_job.c
>> @@ -349,7 +349,7 @@ static void rocket_job_handle_irq(struct rocket_core
>> *core)
>>? return;
>>? }
>>?
>> - iommu_detach_group(NULL, iommu_group_get(core->dev));
>> + iommu_detach_group(NULL, core->iommu_group);
>
>[Severity: High]
>Could this lead to a NULL pointer dereference during device teardown?
>
>In rocket_core_fini(), core->iommu_group is set to NULL without acquiring
>core->job_lock or waiting for the IRQ thread to finish:
>
>rocket_core_fini()
>??? ...
>??? core->iommu_group = NULL;
>
>If the threaded IRQ handler runs concurrently after passing the runtime PM
>check, it will pass this newly nulled pointer to iommu_detach_group(),
>causing the IOMMU core to dereference a NULL pointer when acquiring the
>group's mutex.
>
>>? dma_fence_signal(core->in_flight_job->done_fence);
>>? pm_runtime_put_autosuspend(core->dev);
>>? core->in_flight_job = NULL;
>
>[ ... ]
>
>> @@ -428,16 +431,24 @@ static irqreturn_t rocket_job_irq_handler_thread(int
>> irq, void *data)
>>? static irqreturn_t rocket_job_irq_handler(int irq, void *data)
>>? {
>>? struct rocket_core *core = data;
>> - u32 raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS);
>> + u32 raw_status;
>> +
>> + if (pm_runtime_get_if_active(core->dev) != 1)
>> + return IRQ_NONE;
>
>[Severity: High]
>Will this cause an unhandled interrupt storm during module removal?
>
>During module removal, rocket_core_fini() disables runtime PM:
>
>rocket_core_fini()
>??? pm_runtime_disable(core->dev);
>
>This happens before the hardware is stopped. If a hardware job completes
>subsequently, the device raises a level-triggered interrupt. Since runtime
>PM is disabled, pm_runtime_get_if_active() returns -EINVAL.
>
>The handler then unconditionally returns IRQ_NONE without clearing the
>interrupt. Because the IRQ is level-triggered, the unacknowledged interrupt
>will immediately refire upon handler exit, potentially causing an infinite
>interrupt storm that deadlocks the CPU.
>
>[Severity: Critical]
>This isn't a bug introduced by this patch, but does the error handling in
>rocket_copy_tasks() cause a double free of the tasks array?
>
>If validation of the incoming tasks fails, the array is freed but the
>pointer is not cleared:
>
>rocket_copy_tasks() {
>??? ...
>fail:
>??? kvfree(rjob->tasks);
>??? return ret;
>}
>
>The caller then jumps to the error label and invokes rocket_job_put(),
>which triggers rocket_job_cleanup(). This cleanup function frees the
>dangling pointer again:
>
>rocket_job_cleanup() {
>??? ...
>??? kvfree(job->tasks);
>}
>
>Can a local unprivileged attacker exploit this via the submit ioctl to
>corrupt kernel memory?
>
>[Severity: High]
>This isn't a bug introduced by this patch, but is there a NULL pointer
>dereference when freeing an uninitialized domain?
>
>When the job structure is allocated, all fields are initialized to zero.
>If setup fails early (for example, in drm_sched_job_init() or
>rocket_copy_tasks()), job->domain remains NULL.
>
>The error handling path invokes rocket_job_cleanup(), which unconditionally
>releases the domain:
>
>rocket_job_cleanup() {
>??? ...
>??? rocket_iommu_domain_put(job->domain);
>}
>
>This function immediately dereferences the pointer:
>
>rocket_iommu_domain_put() {
>??? ...
>??? kref_put(&domain->kref, rocket_iommu_domain_destroy);
>}
>
>Could this cause a kernel panic if an unprivileged user passes invalid
>arguments?
>
>--
>Sashiko AI review ・
>https://sashiko.dev/#/patchset/[email protected]?part=2
>