Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
Pre-existing issues: - [High] Leaking `panthor_group` reference count when `tiler_oom_work` is successfully cancelled during group unbind. - [High] Ignored return value from `group_bind_locked()` in `tick_ctx_apply()` can lead to firmware state corruption and hardware hangs. -- commit 3008df4d374d0ccc21ff32d8af0ae14af924b9fd Author: Boris Brezillon <[email protected]> drm/panthor: Automate CSG IRQ processing at group unbind time This commit moves the sched_process_csg_irq_locked() call into group_unbind_locked() to automate CSG IRQ processing at group unbind time. This eliminates the need to manually call it in tick_ctx_apply() and panthor_sched_suspend(). The patch also relocates group_bind_locked() and group_unbind_locked() to avoid forward declarations. > diff --git a/drivers/gpu/drm/panthor/panthor_sched.c > b/drivers/gpu/drm/panthor/panthor_sched.c > index c8818718d1353..124999297aec9 100644 > --- a/drivers/gpu/drm/panthor/panthor_sched.c > +++ b/drivers/gpu/drm/panthor/panthor_sched.c [ ... ] > @@ -1972,6 +1892,88 @@ void panthor_sched_report_fw_events(struct > panthor_device *ptdev, u32 events) > } > } > > +/** > + * group_bind_locked() - Bind a group to a group slot > + * @group: Group. > + * @csg_id: Slot. > + * > + * Return: 0 on success, a negative error code otherwise. > + */ > +static int > +group_bind_locked(struct panthor_group *group, u32 csg_id) [Severity: High] This isn't a bug introduced by this patch, but could ignoring the return value of group_bind_locked() in tick_ctx_apply() lead to firmware state corruption? If group_bind_locked() fails (e.g., panthor_vm_active() encounters an MMU timeout), the error is ignored and csg_slot->group remains NULL. This causes csg_slot_prog_locked() to return immediately without configuring the hardware. However, tick_ctx_apply() then unconditionally queues a CSG_STATE_START or CSG_STATE_RESUME command to the firmware for this slot, despite it being unprogrammed. Can this trigger firmware faults or hardware hangs? > +{ > + struct panthor_device *ptdev = group->ptdev; > + int ret; > + [ ... ] > +static int > +group_unbind_locked(struct panthor_group *group) > +{ > + struct panthor_device *ptdev = group->ptdev; > + [ ... ] > + panthor_vm_idle(group->vm); > + > + /* Tiler OOM events will be re-issued next time the group is scheduled. > */ > + atomic_set(&group->tiler_oom, 0); > + cancel_work(&group->tiler_oom_work); [Severity: High] This is a pre-existing issue, but does this leak the group reference if cancel_work() returns true? When a tiler out-of-memory (OOM) event occurs, cs_slot_process_tiler_oom_event_locked() acquires a reference to the group and queues tiler_oom_work. If the group is unbound before the work executes, group_unbind_locked() calls cancel_work(&group->tiler_oom_work). Because cancel_work() removes the pending work from the queue without executing the work function, the delegated group_put() is skipped. Should we check the return value of cancel_work() and call group_put(group) if it returns true? > + > + for (u32 i = 0; i < group->queue_count; i++) > + group->queues[i]->doorbell_id = -1; > + > + group_put(group); > + return 0; > +} -- Sashiko AI review ยท https://sashiko.dev/#/patchset/[email protected]?part=7
