Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
New issues: - [High] Unconditional hw_sob_get() causes permanent reference leak and breaks hardware SOB reset Pre-existing issues: - [High] SOB reference leak on hl_hw_queue_schedule_cs error path -- commit 3b275b62c13a1feac292163762a58fb995072f8e Author: WenTao Liang <[email protected]> fix: accel/habanalabs: hl_cs_signal_sob_wraparound_handler: missing hw_sob_get when need_reset is true and encaps_sig is false When other_sob->need_reset is true and encaps_sig is false, hw_sob_put(other_sob) decrements the kref to 0, but the matching hw_sob_get(other_sob) is skipped because it is inside the encaps_sig block. The function returns other_sob with kref=0, causing a subsequent kref_put to underflow. Fix by adding hw_sob_get(other_sob) in the else branch. Cc: [email protected] Fixes: dadf17abb724 ("habanalabs: add support for encapsulated signals reservation") > diff --git a/drivers/accel/habanalabs/common/command_submission.c > b/drivers/accel/habanalabs/common/command_submission.c > index d51b4cf332b2a..ffcdf29b7e070 100644 > --- a/drivers/accel/habanalabs/common/command_submission.c > +++ b/drivers/accel/habanalabs/common/command_submission.c > @@ -1869,11 +1869,10 @@ int hl_cs_signal_sob_wraparound_handler(struct > hl_device *hdev, u32 q_idx, > if (other_sob->need_reset) > hw_sob_put(other_sob); > > - if (encaps_sig) { > + if (encaps_sig) > /* set reset indication for the sob */ > sob->need_reset = true; > - hw_sob_get(other_sob); > - } > + hw_sob_get(other_sob); [Severity: High] Does this unconditional increment cause a permanent reference leak when encaps_sig is false and need_reset is false? For regular signals (encaps_sig == false), cs_cmpl->hw_sob is not updated to point to the new SOB in drivers/accel/habanalabs/common/hw_queue.c:init_signal_cs(): cs_cmpl->hw_sob = hw_sob; ... rc = hl_cs_signal_sob_wraparound_handler(hdev, q_idx, &hw_sob, 1, false); Because cs_cmpl->hw_sob retains the old SOB, this new reference acquired here in the wraparound handler is never tracked for release. This will prevent the hardware SOB refcount from ever reaching 0, breaking the sync stream mechanism. [Severity: High] This is a pre-existing issue, but does the hl_hw_queue_schedule_cs() error path leak the SOB reference if a check fails after acquiring it? In drivers/accel/habanalabs/common/hw_queue.c:hl_hw_queue_schedule_cs(), if hl_staged_cs_find_first() fails, it aborts after init_signal_wait_cs() has already acquired a reference to the SOB: staged_cs = hl_staged_cs_find_first(hdev, cs->staged_sequence); if (!staged_cs) { ... rc = -EINVAL; goto unlock_cs_mirror; } During the rollback, cs_release_sob_reset_handler() in drivers/accel/habanalabs/common/command_submission.c skips releasing the reference because cs->submitted is false: if (!hl_cs_cmpl->hw_sob || !cs->submitted) return; ... hw_sob_put(hl_cs_cmpl->hw_sob); Could this be triggered by submitting an IOCTL with an invalid staged submission sequence to cause a denial of service? > > dev_dbg(hdev->dev, "switched to SOB %d, q_idx: %d\n", > prop->curr_sob_offset, q_idx); -- Sashiko AI review ยท https://sashiko.dev/#/patchset/[email protected]?part=1
