Thank you for your contribution! Sashiko AI review found 2 potential issue(s) 
to consider:

Pre-existing issues:
- [High] Leaking `panthor_group` reference count when `tiler_oom_work` is 
successfully cancelled during group unbind.
- [High] Ignored return value from `group_bind_locked()` in `tick_ctx_apply()` 
can lead to firmware state corruption and hardware hangs.
--

commit 3008df4d374d0ccc21ff32d8af0ae14af924b9fd
Author: Boris Brezillon <[email protected]>

drm/panthor: Automate CSG IRQ processing at group unbind time

This commit moves the sched_process_csg_irq_locked() call into
group_unbind_locked() to automate CSG IRQ processing at group unbind time.
This eliminates the need to manually call it in tick_ctx_apply() and
panthor_sched_suspend(). The patch also relocates group_bind_locked() and
group_unbind_locked() to avoid forward declarations.

> diff --git a/drivers/gpu/drm/panthor/panthor_sched.c 
> b/drivers/gpu/drm/panthor/panthor_sched.c
> index c8818718d1353..124999297aec9 100644
> --- a/drivers/gpu/drm/panthor/panthor_sched.c
> +++ b/drivers/gpu/drm/panthor/panthor_sched.c

[ ... ]

> @@ -1972,6 +1892,88 @@ void panthor_sched_report_fw_events(struct 
> panthor_device *ptdev, u32 events)
>       }
>  }
>  
> +/**
> + * group_bind_locked() - Bind a group to a group slot
> + * @group: Group.
> + * @csg_id: Slot.
> + *
> + * Return: 0 on success, a negative error code otherwise.
> + */
> +static int
> +group_bind_locked(struct panthor_group *group, u32 csg_id)

[Severity: High]
This isn't a bug introduced by this patch, but could ignoring the return
value of group_bind_locked() in tick_ctx_apply() lead to firmware state
corruption?

If group_bind_locked() fails (e.g., panthor_vm_active() encounters an MMU
timeout), the error is ignored and csg_slot->group remains NULL. 

This causes csg_slot_prog_locked() to return immediately without configuring
the hardware. However, tick_ctx_apply() then unconditionally queues a
CSG_STATE_START or CSG_STATE_RESUME command to the firmware for this slot,
despite it being unprogrammed. 

Can this trigger firmware faults or hardware hangs?

> +{
> +     struct panthor_device *ptdev = group->ptdev;
> +     int ret;
> +

[ ... ]

> +static int
> +group_unbind_locked(struct panthor_group *group)
> +{
> +     struct panthor_device *ptdev = group->ptdev;
> +

[ ... ]

> +     panthor_vm_idle(group->vm);
> +
> +     /* Tiler OOM events will be re-issued next time the group is scheduled. 
> */
> +     atomic_set(&group->tiler_oom, 0);
> +     cancel_work(&group->tiler_oom_work);

[Severity: High]
This is a pre-existing issue, but does this leak the group reference if
cancel_work() returns true? 

When a tiler out-of-memory (OOM) event occurs, 
cs_slot_process_tiler_oom_event_locked() acquires a reference to the group
and queues tiler_oom_work. 

If the group is unbound before the work executes, group_unbind_locked()
calls cancel_work(&group->tiler_oom_work). Because cancel_work() removes
the pending work from the queue without executing the work function, the
delegated group_put() is skipped. 

Should we check the return value of cancel_work() and call group_put(group)
if it returns true?

> +
> +     for (u32 i = 0; i < group->queue_count; i++)
> +             group->queues[i]->doorbell_id = -1;
> +
> +     group_put(group);
> +     return 0;
> +}

-- 
Sashiko AI review ยท 
https://sashiko.dev/#/patchset/[email protected]?part=7

Reply via email to