rocket: Fix iommu_group leak and unsafe IRQ register access

赵金明 Wed, 10 Jun 2026 00:58:55 -0700


>Thank you for your contribution! Sashiko AI review found 2 potential issue(s) 
>to consider:



>



>New issues:



>- [High] Runtime PM usage counter underflow due to incorrect error handling of 
>`pm_runtime_get_if_active()` in threaded IRQ handler.



>
commits：

Regarding the -EINVAL case in the threaded IRQ handler's 
pm_runtime_get_if_active() call at line 423:

While it is true that a non-zero return value (including -EINVAL) would satisfy 
the if condition, the concern about underflow does not apply in practice. This 
is because pm_runtime_get_if_active() returns -EINVAL only when disable_depth > 
0 (see runtime.c:1234), which means pm_runtime_disable() has been called on the 
device. In this driver, pm_runtime_disable() is only called in 
rocket_core_fini() (rocket_core.c:100), after the IRQ handler has been torn 
down by rocket_job_fini(). So the threaded IRQ handler can never execute while 
PM is disabled ― the two states are mutually exclusive in the driver's 
lifecycle.

Furthermore, even if this theoretical path were somehow reached, the kernel's 
runtime PM subsystem has an explicit underflow guard in rpm_drop_usage_count() 
(runtime.c:1079-1096): it atomically decrements the usage count, and if the 
result is negative, it immediately increments it back and emits a dev_warn. The 
usage count is never allowed to go negative.

That said, the hardirq handler at line 437 already uses != 1 for this exact 
reason. For consistency and defense-in-depth, changing the thread handler to 
use != 1 as well is a reasonable improvement and can be done in a follow-up if 
desired.



>Pre-existing issues:



>- [High] Returning `IRQ_NONE` for valid DMA error interrupts without clearing 
>the interrupt status causes a hardirq storm.



>--



>



>commit 5d8993f6abe579de7eec7f4c98365e11dd141bf8



>Author: ZhaoJinming <[email protected]>



>



>accel/rocket: Fix iommu_group leak and unsafe IRQ register access



>



>This commit fixes an iommu_group reference leak in the IRQ handling path



>and introduces runtime PM checks in the IRQ handlers to prevent unsafe



>hardware register access when the device is suspended.



>



>> diff --git a/drivers/accel/rocket/rocket_job.c 
>> b/drivers/accel/rocket/rocket_job.c



>> index cf9f61682dd8b..6ac79b8517319 100644



>> --- a/drivers/accel/rocket/rocket_job.c



>> +++ b/drivers/accel/rocket/rocket_job.c



>[ ... ]



>> @@ -420,7 +420,10 @@ static irqreturn_t rocket_job_irq_handler_thread(int 
>> irq, void *data)



>>? {



>>?     struct rocket_core *core = data;



>>? 



>> -    rocket_job_handle_irq(core);



>> +    if (pm_runtime_get_if_active(core->dev)) {



>> +            rocket_job_handle_irq(core);



>> +            pm_runtime_put(core->dev);



>> +    }



>>? 



>>?     return IRQ_HANDLED;



>>? }



>



>[Severity: High]



>Does this evaluation handle negative error codes correctly?



>



>pm_runtime_get_if_active() returns 1 if the device is active, 0 if inactive,



>and a negative error code (such as -EINVAL) if runtime PM is disabled. Because



>any non-zero value evaluates to true in C, a negative return value will satisfy



>this condition.



>



>If that happens, the code proceeds to call pm_runtime_put(). Since the initial



>get call failed and did not increment the usage counter, wouldn't this



>unconditionally decrement the counter and lead to an underflow?



>



>Could this explicitly check for success, such as evaluating if the return value



>is strictly equal to 1?



>



>> @@ -428,16 +431,24 @@ static irqreturn_t rocket_job_irq_handler_thread(int 
>> irq, void *data)



>>? static irqreturn_t rocket_job_irq_handler(int irq, void *data)



>>? {



>>?     struct rocket_core *core = data;



>> -    u32 raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS);



>> +    u32 raw_status;



>> +



>> +    if (pm_runtime_get_if_active(core->dev) != 1)



>> +            return IRQ_NONE;



>> +



>> +    raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS);



>>? 



>>?     WARN_ON(raw_status & PC_INTERRUPT_RAW_STATUS_DMA_READ_ERROR);



>>?     WARN_ON(raw_status & PC_INTERRUPT_RAW_STATUS_DMA_WRITE_ERROR);



>>? 



>>?     if (!(raw_status & PC_INTERRUPT_RAW_STATUS_DPU_0 ||



>> -    ????? raw_status & PC_INTERRUPT_RAW_STATUS_DPU_1))



>> +    ????? raw_status & PC_INTERRUPT_RAW_STATUS_DPU_1)) {



>> +            pm_runtime_put(core->dev);



>>?             return IRQ_NONE;



>> +    }



>>? 



>>?     rocket_pc_writel(core, INTERRUPT_MASK, 0x0);



>> +    pm_runtime_put(core->dev);



>>? 



>>?     return IRQ_WAKE_THREAD;



>>? }



>



>[Severity: High]



>This isn't a bug introduced by this patch, but will this cause a hardirq storm



>if a DMA error occurs without a DPU completion?



>



>If an interrupt is raised due to a DMA read or write error, the code logs a



>warning via WARN_ON. However, if the DPU completion bits are not set, the



>handler returns IRQ_NONE without clearing, acknowledging, or masking the



>pending interrupt.



>



>Since the interrupt remains asserted at the hardware level, won't the kernel's



>IRQ core immediately re-invoke the handler in an infinite loop, hanging the CPU



>until the spurious interrupt detector permanently disables the shared IRQ line?



>



>-- 



>Sashiko AI review ・ 
>https://sashiko.dev/#/patchset/[email protected]?part=2



>
Re: Re: [PATCH v4 2/2] accel/rocket: Fix iommu_group leak and unsafe IRQ register access

Reply via email to