Thank you for your contribution! Sashiko AI review found 2 potential issue(s) 
to consider:

Pre-existing issues:
- [High] IOMMU group reference leak in rocket_job_handle_irq()
- [High] Unsafe hardware register access in shared IRQ handler while runtime 
suspended
--

commit fca53bbe333680ce9d4e0368b39e684506994081
Author: ZhaoJinming <[email protected]>
Subject: accel/rocket: Fix fence leak, runtime PM leak and WARN in 
rocket_job_run() error paths

This commit fixes multiple resource leaks in the error paths of 
rocket_job_run().
It replaces pm_runtime_get_sync() with pm_runtime_resume_and_get() to handle
the usage counter, correctly releases dma_fence references on failure, and
adds pm_runtime_put() during IOMMU attachment failures.

> diff --git a/drivers/accel/rocket/rocket_job.c 
> b/drivers/accel/rocket/rocket_job.c
> index 2f1861f960ccb..cf9f61682dd8b 100644
> --- a/drivers/accel/rocket/rocket_job.c
> +++ b/drivers/accel/rocket/rocket_job.c

[ ... ]

> @@ -310,13 +310,22 @@ static struct dma_fence *rocket_job_run(struct 
> drm_sched_job *sched_job)
>               dma_fence_put(job->done_fence);
>       job->done_fence = dma_fence_get(fence);
>  
> -     ret = pm_runtime_get_sync(core->dev);
> -     if (ret < 0)
> -             return fence;
> +     ret = pm_runtime_resume_and_get(core->dev);
> +     if (ret < 0) {
> +             dma_fence_put(job->done_fence);
> +             job->done_fence = NULL;
> +             dma_fence_put(fence);
> +             return ERR_PTR(ret);
> +     }
>  
>       ret = iommu_attach_group(job->domain->domain, core->iommu_group);
> -     if (ret < 0)
> -             return fence;
> +     if (ret < 0) {
> +             pm_runtime_put(core->dev);
> +             dma_fence_put(job->done_fence);
> +             job->done_fence = NULL;
> +             dma_fence_put(fence);
> +             return ERR_PTR(ret);
> +     }
>  
>       scoped_guard(mutex, &core->job_lock) {
>               core->in_flight_job = job;

[ ... ]

>  static void rocket_job_handle_irq(struct rocket_core *core)
>  {
>       pm_runtime_mark_last_busy(core->dev);
>  
>       rocket_pc_writel(core, OPERATION_ENABLE, 0x0);
>       rocket_pc_writel(core, INTERRUPT_CLEAR, 0x1ffff);
>  
>       scoped_guard(mutex, &core->job_lock)
>               if (core->in_flight_job) {
>                       if (core->in_flight_job->next_task_idx < 
> core->in_flight_job->task_count) {
>                               rocket_job_hw_submit(core, core->in_flight_job);
>                               return;
>                       }
>  
>                       iommu_detach_group(NULL, iommu_group_get(core->dev));

[Severity: High]
This isn't a bug introduced by this patch, but does this leak an IOMMU
group reference?

It looks like iommu_group_get() increments the kobject reference count for the
group. However, iommu_detach_group() doesn't consume this reference, and there
is no subsequent iommu_group_put() call.

Since this runs on every completed job, could this eventually overflow the
reference counter and prevent the group from being freed?

[ ... ]

>  static irqreturn_t rocket_job_irq_handler(int irq, void *data)
>  {
>       struct rocket_core *core = data;
>       u32 raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS);

[Severity: High]
This is a pre-existing issue, but is it safe to access hardware registers
here without first checking if the device is active?

Since the driver registers this as a shared interrupt handler, it could be
invoked if another device on the same IRQ line triggers an interrupt while the
NPU device is runtime suspended.

Reading from a powered-down device typically returns 0xffffffff, which would
spuriously trigger the WARN_ON macros below and falsely return 
IRQ_WAKE_THREAD.

Should this use pm_runtime_get_if_active() to verify the device is powered
before reading INTERRUPT_RAW_STATUS?

>  
>       WARN_ON(raw_status & PC_INTERRUPT_RAW_STATUS_DMA_READ_ERROR);
>       WARN_ON(raw_status & PC_INTERRUPT_RAW_STATUS_DMA_WRITE_ERROR);

-- 
Sashiko AI review ยท 
https://sashiko.dev/#/patchset/[email protected]?part=1

Reply via email to