Thank you for your contribution! Sashiko AI review found 3 potential issue(s) to consider:
New issues: - [Critical] The patch swallows error returns by unconditionally returning 0, leading to silent failure and a critical Use-After-Free of GEM objects. Pre-existing issues: - [High] Unexecuted `vm_ops` leak GEM object references when a VM bind job is aborted. - [High] Destructive VMA tree modification prior to allocation corrupts VM state and bypasses unusable marking on failure. -- commit f53acc63abb5c7329e3a7401e307a89d292891fa Author: Wentao Liang <[email protected]> drm/msm: fix refcount leak in msm_gem_vm_sm_step_remap() This patch attempts to fix a reference leak during VMA remap by routing failures to a common error path. The temporary reference on vm_bo is dropped before returning. Cc: [email protected] Fixes: 2e6a8a1fe2b2 ("drm/msm: Add VM_BIND ioctl") > diff --git a/drivers/gpu/drm/msm/msm_gem_vma.c > b/drivers/gpu/drm/msm/msm_gem_vma.c > index 1a952b171ed7f..69289bea7a665 100644 > --- a/drivers/gpu/drm/msm/msm_gem_vma.c > +++ b/drivers/gpu/drm/msm/msm_gem_vma.c > @@ -602,8 +602,10 @@ msm_gem_vm_sm_step_remap(struct drm_gpuva_op *op, void > *arg) > > if (op->remap.prev) { > prev_vma = vma_from_op(arg, op->remap.prev); [Severity: High] This is a pre-existing issue, but does closing orig_vma before attempting the fallible allocations for prev_vma and next_vma corrupt the VM state if those allocations fail? Since orig_vma is removed from the software tree, a subsequent failure here leaves the software state permanently desynchronized from the hardware. Additionally, if this happens on the first operation of a job in vm_bind_job_prepare(), it skips marking the VM as unusable, leaving it active despite the corrupted VMA tracking tree. > - if (WARN_ON(IS_ERR(prev_vma))) > - return PTR_ERR(prev_vma); > + if (WARN_ON(IS_ERR(prev_vma))) { > + ret = PTR_ERR(prev_vma); > + goto drop_ref; > + } > > vm_dbg("prev_vma: %p:%p: %016llx %016llx", vm, prev_vma, > prev_vma->va.addr, prev_vma->va.range); > to_msm_vma(prev_vma)->mapped = mapped; > @@ -612,8 +614,10 @@ msm_gem_vm_sm_step_remap(struct drm_gpuva_op *op, void > *arg) > > if (op->remap.next) { > next_vma = vma_from_op(arg, op->remap.next); > - if (WARN_ON(IS_ERR(next_vma))) > - return PTR_ERR(next_vma); > + if (WARN_ON(IS_ERR(next_vma))) { > + ret = PTR_ERR(next_vma); > + goto drop_ref; > + } > > vm_dbg("next_vma: %p:%p: %016llx %016llx", vm, next_vma, > next_vma->va.addr, next_vma->va.range); > to_msm_vma(next_vma)->mapped = mapped; > @@ -623,6 +627,7 @@ msm_gem_vm_sm_step_remap(struct drm_gpuva_op *op, void > *arg) > if (!mapped) > drm_gpuvm_bo_evict(vm_bo, true); > > +drop_ref: > /* Drop the previous ref: */ > drm_gpuvm_bo_put(vm_bo); > > return 0; [Severity: Critical] Does this error path inadvertently return 0 instead of the error code stored in ret? By unconditionally returning 0, the caller (vm_bind_job_prepare) will assume the remap succeeded and submit the job, causing hardware and software tracking to desynchronize. Because orig_vma was closed and the new VMAs were not inserted, the GPUVM drops all references to the backing GEM object, leading to a use-after-free when the hardware still retains valid page table entries. Also, if this is changed to return ret, could it return an uninitialized value if the function succeeds but mapped is false (and thus ret was never assigned)? [ ... ] [Severity: High] This is a pre-existing issue, but does the cleanup loop in msm_vma_job_free() leak GEM object references? When a VM operation is enqueued via vm_op_enqueue(), an extra reference is taken on op->obj. If the job fails during preparation and is aborted, the cleanup code frees the operation structures but does not appear to drop the reference: drivers/gpu/drm/msm/msm_gem_vma.c:msm_vma_job_free() { ... /* In error paths, we could have unexecuted ops: */ while (!list_empty(&job->vm_ops)) { struct msm_vm_op *op = list_first_entry(&job->vm_ops, struct msm_vm_op, node); list_del(&op->node); kfree(op); } ... } Could this allow unprivileged userspace to exhaust kernel memory by submitting jobs that intentionally fail after enqueuing operations? -- Sashiko AI review ยท https://sashiko.dev/#/patchset/[email protected]?part=1
