Thank you for your contribution! Sashiko AI review found 4 potential issue(s) 
to consider:

New issues:
- [High] Dropping dev_lock during amdxdna_remove() list iteration causes a 
Use-After-Free if a file is closed concurrently.
- [High] Deferring IOMMU SVA unbind to amdxdna_drm_close() causes a NULL 
pointer dereference in the IOMMU core.

Pre-existing issues:
- [High] Deferred GEM object freeing accesses destroyed IOMMU domain 
structures, causing a Use-After-Free.
- [High] Deferred GEM object freeing calls flush_workqueue() on a destroyed 
workqueue pointer.
--

commit ec7bb08ec6d7a0837cd580d24a3f76e7ed06a6c6
Author: Lizhi Hou <[email protected]>

accel/amdxdna: Fix amdxdna_client lifetime race during device removal

This patch changes the client cleanup logic during device removal.
It decouples hardware teardown from client cleanup to avoid a use-after-free
if amdxdna_drm_close() is called concurrently with amdxdna_remove().

> diff --git a/drivers/accel/amdxdna/amdxdna_pci_drv.c 
> b/drivers/accel/amdxdna/amdxdna_pci_drv.c
> index c677293c1ae7f..3016db07c7cad 100644
> --- a/drivers/accel/amdxdna/amdxdna_pci_drv.c
> +++ b/drivers/accel/amdxdna/amdxdna_pci_drv.c
> @@ -167,18 +167,12 @@ static void amdxdna_drm_close(struct drm_device *ddev, 
> struct drm_file *filp)
>  {
>       struct amdxdna_client *client = filp->driver_priv;
>       struct amdxdna_dev *xdna = to_xdna_dev(ddev);
> -     int idx;
>  
>       XDNA_DBG(xdna, "closing pid %d", client->pid);
>  
> -     if (!drm_dev_enter(&xdna->ddev, &idx))
> -             return;
> -
>       mutex_lock(&xdna->dev_lock);
>       amdxdna_client_cleanup(client);

[Severity: High]
Since amdxdna_client_cleanup() is now deferred until file close, could this
trigger a NULL pointer dereference during SVA teardown?

When a device is removed, the PCI subsystem unlinks it from its IOMMU group.
Later, when this file is closed, amdxdna_sva_fini() calls 
iommu_sva_unbind_device().

Inside the IOMMU core, iommu_detach_device_pasid() dereferences the group
pointer:

        struct iommu_group *group = dev->iommu_group;

        mutex_lock(&group->mutex);

If the device was already removed, dev->iommu_group would be NULL, causing a
crash. Does the SVA unbind need to happen earlier during device removal?

>       mutex_unlock(&xdna->dev_lock);
> -
> -     drm_dev_exit(idx);
>  }
>  
> [ ... ]
> @@ -436,14 +430,8 @@ static void amdxdna_remove(struct pci_dev *pdev)
>       amdxdna_sysfs_fini(xdna);
>  
>       mutex_lock(&xdna->dev_lock);
> -     client = list_first_entry_or_null(&xdna->client_list,
> -                                       struct amdxdna_client, node);
> -     while (client) {
> -             amdxdna_client_cleanup(client);
> -
> -             client = list_first_entry_or_null(&xdna->client_list,
> -                                               struct amdxdna_client, node);
> -     }
> +     list_for_each_entry(client, &xdna->client_list, node)

[Severity: High]
Does this list_for_each_entry() loop have a use-after-free risk if the lock
is dropped internally?

The loop calls amdxdna_hwctx_remove_all(), which invokes aie2_hwctx_fini().
Inside aie2_hwctx_fini(), xdna->dev_lock is temporarily dropped to wait for
hardware idle:

        mutex_unlock(&xdna->dev_lock);

Since this patch removes the drm_dev_enter() safeguard from
amdxdna_drm_close(), a concurrent file close can acquire dev_lock while it
is dropped here. The concurrent close would remove the client from the list
via list_del() and free the client structure.

When the lock is re-acquired and list_for_each_entry() attempts to fetch the
next element, won't it dereference the freed client->node.next pointer?

> +             amdxdna_hwctx_remove_all(client);
>  
>       xdna->dev_info->ops->fini(xdna);
>       mutex_unlock(&xdna->dev_lock);

[Severity: High]
This is a pre-existing issue, but does this function leave xdna->domain
dangling, leading to a use-after-free on deferred file close?

During device removal, amdxdna_remove() calls amdxdna_iommu_fini(), which
frees the domain but leaves the pointer intact:

        iommu_domain_free(xdna->domain);

If the application keeps the file descriptor open, drm_gem_object_put() will
later invoke amdxdna_gem_obj_free() during cleanup. This calls
amdxdna_dma_unmap_bo():

        if (amdxdna_iova_on(xdna)) {
                size = iova_align(&xdna->iovad, abo->mem.size);
                iommu_unmap(xdna->domain, abo->mem.dma_addr, size);

Since xdna->domain is not NULL, amdxdna_iova_on() returns true, and
iommu_unmap() is called on the freed domain. Should the domain pointer
be set to NULL after being freed?


[Severity: High]
This is a pre-existing issue, but could deferred GEM cleanup trigger a
use-after-free on the notifier workqueue?

Earlier in amdxdna_remove(), destroy_workqueue(xdna->notifier_wq) frees the
workqueue structure.

When a process retains an open file descriptor past device removal, the DRM
core defers GEM cleanup to the eventual file close. When
drm_gem_object_put() calls amdxdna_gem_obj_free(), it unconditionally does:

        flush_workqueue(xdna->notifier_wq);

Since the workqueue was already destroyed and freed, won't this result in
a use-after-free crash?

-- 
Sashiko AI review ยท 
https://sashiko.dev/#/patchset/[email protected]?part=1

Reply via email to