Thank you for your contribution! Sashiko AI review found 2 potential issue(s) 
to consider:

New issues:
- [High] Unconditional hw_sob_get() causes permanent reference leak and breaks 
hardware SOB reset

Pre-existing issues:
- [High] SOB reference leak on hl_hw_queue_schedule_cs error path
--

commit 3b275b62c13a1feac292163762a58fb995072f8e
Author: WenTao Liang <[email protected]>

fix: accel/habanalabs: hl_cs_signal_sob_wraparound_handler: missing hw_sob_get 
when need_reset is true and encaps_sig is false

When other_sob->need_reset is true and encaps_sig is false,
  hw_sob_put(other_sob) decrements the kref to 0, but the matching
  hw_sob_get(other_sob) is skipped because it is inside the encaps_sig
  block. The function returns other_sob with kref=0, causing a subsequent
  kref_put to underflow. Fix by adding hw_sob_get(other_sob) in the else
  branch.

Cc: [email protected]
Fixes: dadf17abb724 ("habanalabs: add support for encapsulated signals 
reservation")

> diff --git a/drivers/accel/habanalabs/common/command_submission.c 
> b/drivers/accel/habanalabs/common/command_submission.c
> index d51b4cf332b2a..ffcdf29b7e070 100644
> --- a/drivers/accel/habanalabs/common/command_submission.c
> +++ b/drivers/accel/habanalabs/common/command_submission.c
> @@ -1869,11 +1869,10 @@ int hl_cs_signal_sob_wraparound_handler(struct 
> hl_device *hdev, u32 q_idx,
>               if (other_sob->need_reset)
>                       hw_sob_put(other_sob);
>  
> -             if (encaps_sig) {
> +             if (encaps_sig)
>                       /* set reset indication for the sob */
>                       sob->need_reset = true;
> -                     hw_sob_get(other_sob);
> -             }
> +             hw_sob_get(other_sob);

[Severity: High]
Does this unconditional increment cause a permanent reference leak when
encaps_sig is false and need_reset is false?

For regular signals (encaps_sig == false), cs_cmpl->hw_sob is not updated to
point to the new SOB in 
drivers/accel/habanalabs/common/hw_queue.c:init_signal_cs():

    cs_cmpl->hw_sob = hw_sob;
    ...
    rc = hl_cs_signal_sob_wraparound_handler(hdev, q_idx, &hw_sob, 1, false);

Because cs_cmpl->hw_sob retains the old SOB, this new reference acquired here
in the wraparound handler is never tracked for release. This will prevent the
hardware SOB refcount from ever reaching 0, breaking the sync stream mechanism.

[Severity: High]
This is a pre-existing issue, but does the hl_hw_queue_schedule_cs() error
path leak the SOB reference if a check fails after acquiring it?

In drivers/accel/habanalabs/common/hw_queue.c:hl_hw_queue_schedule_cs(), if
hl_staged_cs_find_first() fails, it aborts after init_signal_wait_cs() has
already acquired a reference to the SOB:

    staged_cs = hl_staged_cs_find_first(hdev, cs->staged_sequence);
    if (!staged_cs) {
        ...
        rc = -EINVAL;
        goto unlock_cs_mirror;
    }

During the rollback, cs_release_sob_reset_handler() in
drivers/accel/habanalabs/common/command_submission.c skips releasing the
reference because cs->submitted is false:

    if (!hl_cs_cmpl->hw_sob || !cs->submitted)
        return;
    ...
    hw_sob_put(hl_cs_cmpl->hw_sob);

Could this be triggered by submitting an IOCTL with an invalid staged
submission sequence to cause a denial of service?

>  
>               dev_dbg(hdev->dev, "switched to SOB %d, q_idx: %d\n",
>                               prop->curr_sob_offset, q_idx);

-- 
Sashiko AI review ยท 
https://sashiko.dev/#/patchset/[email protected]?part=1

Reply via email to