Thank you for your contribution! Sashiko AI review found 3 potential issue(s) 
to consider:

New issues:
- [Critical] The patch swallows error returns by unconditionally returning 0, 
leading to silent failure and a critical Use-After-Free of GEM objects.

Pre-existing issues:
- [High] Unexecuted `vm_ops` leak GEM object references when a VM bind job is 
aborted.
- [High] Destructive VMA tree modification prior to allocation corrupts VM 
state and bypasses unusable marking on failure.
--

commit f53acc63abb5c7329e3a7401e307a89d292891fa
Author: Wentao Liang <[email protected]>

drm/msm: fix refcount leak in msm_gem_vm_sm_step_remap()

This patch attempts to fix a reference leak during VMA remap by routing
failures to a common error path. The temporary reference on vm_bo is
dropped before returning.

Cc: [email protected]
Fixes: 2e6a8a1fe2b2 ("drm/msm: Add VM_BIND ioctl")

> diff --git a/drivers/gpu/drm/msm/msm_gem_vma.c 
> b/drivers/gpu/drm/msm/msm_gem_vma.c
> index 1a952b171ed7f..69289bea7a665 100644
> --- a/drivers/gpu/drm/msm/msm_gem_vma.c
> +++ b/drivers/gpu/drm/msm/msm_gem_vma.c
> @@ -602,8 +602,10 @@ msm_gem_vm_sm_step_remap(struct drm_gpuva_op *op, void 
> *arg)
>  
>       if (op->remap.prev) {
>               prev_vma = vma_from_op(arg, op->remap.prev);

[Severity: High]
This is a pre-existing issue, but does closing orig_vma before attempting
the fallible allocations for prev_vma and next_vma corrupt the VM state if
those allocations fail?

Since orig_vma is removed from the software tree, a subsequent failure here
leaves the software state permanently desynchronized from the hardware.

Additionally, if this happens on the first operation of a job in
vm_bind_job_prepare(), it skips marking the VM as unusable, leaving it
active despite the corrupted VMA tracking tree.

> -             if (WARN_ON(IS_ERR(prev_vma)))
> -                     return PTR_ERR(prev_vma);
> +             if (WARN_ON(IS_ERR(prev_vma))) {
> +                     ret = PTR_ERR(prev_vma);
> +                     goto drop_ref;
> +             }
>  
>               vm_dbg("prev_vma: %p:%p: %016llx %016llx", vm, prev_vma, 
> prev_vma->va.addr, prev_vma->va.range);
>               to_msm_vma(prev_vma)->mapped = mapped;
> @@ -612,8 +614,10 @@ msm_gem_vm_sm_step_remap(struct drm_gpuva_op *op, void 
> *arg)
>  
>       if (op->remap.next) {
>               next_vma = vma_from_op(arg, op->remap.next);
> -             if (WARN_ON(IS_ERR(next_vma)))
> -                     return PTR_ERR(next_vma);
> +             if (WARN_ON(IS_ERR(next_vma))) {
> +                     ret = PTR_ERR(next_vma);
> +                     goto drop_ref;
> +             }
>  
>               vm_dbg("next_vma: %p:%p: %016llx %016llx", vm, next_vma, 
> next_vma->va.addr, next_vma->va.range);
>               to_msm_vma(next_vma)->mapped = mapped;
> @@ -623,6 +627,7 @@ msm_gem_vm_sm_step_remap(struct drm_gpuva_op *op, void 
> *arg)
>       if (!mapped)
>               drm_gpuvm_bo_evict(vm_bo, true);
>  
> +drop_ref:
>       /* Drop the previous ref: */
>       drm_gpuvm_bo_put(vm_bo);
>  
>       return 0;

[Severity: Critical]
Does this error path inadvertently return 0 instead of the error code
stored in ret?

By unconditionally returning 0, the caller (vm_bind_job_prepare) will
assume the remap succeeded and submit the job, causing hardware and
software tracking to desynchronize. Because orig_vma was closed and the
new VMAs were not inserted, the GPUVM drops all references to the backing
GEM object, leading to a use-after-free when the hardware still retains
valid page table entries.

Also, if this is changed to return ret, could it return an uninitialized
value if the function succeeds but mapped is false (and thus ret was never
assigned)?

[ ... ]

[Severity: High]
This is a pre-existing issue, but does the cleanup loop in msm_vma_job_free()
leak GEM object references?

When a VM operation is enqueued via vm_op_enqueue(), an extra reference is
taken on op->obj. If the job fails during preparation and is aborted, the
cleanup code frees the operation structures but does not appear to drop the
reference:

drivers/gpu/drm/msm/msm_gem_vma.c:msm_vma_job_free() {
    ...
    /* In error paths, we could have unexecuted ops: */
    while (!list_empty(&job->vm_ops)) {
        struct msm_vm_op *op =
            list_first_entry(&job->vm_ops, struct msm_vm_op, node);
        list_del(&op->node);
        kfree(op);
    }
    ...
}

Could this allow unprivileged userspace to exhaust kernel memory by
submitting jobs that intentionally fail after enqueuing operations?

-- 
Sashiko AI review ยท 
https://sashiko.dev/#/patchset/[email protected]?part=1

Reply via email to