Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
New issues: - [High] Dropping dev_lock during amdxdna_remove() list iteration causes a Use-After-Free if a file is closed concurrently. - [High] Deferring IOMMU SVA unbind to amdxdna_drm_close() causes a NULL pointer dereference in the IOMMU core. Pre-existing issues: - [High] Deferred GEM object freeing accesses destroyed IOMMU domain structures, causing a Use-After-Free. - [High] Deferred GEM object freeing calls flush_workqueue() on a destroyed workqueue pointer. -- commit ec7bb08ec6d7a0837cd580d24a3f76e7ed06a6c6 Author: Lizhi Hou <[email protected]> accel/amdxdna: Fix amdxdna_client lifetime race during device removal This patch changes the client cleanup logic during device removal. It decouples hardware teardown from client cleanup to avoid a use-after-free if amdxdna_drm_close() is called concurrently with amdxdna_remove(). > diff --git a/drivers/accel/amdxdna/amdxdna_pci_drv.c > b/drivers/accel/amdxdna/amdxdna_pci_drv.c > index c677293c1ae7f..3016db07c7cad 100644 > --- a/drivers/accel/amdxdna/amdxdna_pci_drv.c > +++ b/drivers/accel/amdxdna/amdxdna_pci_drv.c > @@ -167,18 +167,12 @@ static void amdxdna_drm_close(struct drm_device *ddev, > struct drm_file *filp) > { > struct amdxdna_client *client = filp->driver_priv; > struct amdxdna_dev *xdna = to_xdna_dev(ddev); > - int idx; > > XDNA_DBG(xdna, "closing pid %d", client->pid); > > - if (!drm_dev_enter(&xdna->ddev, &idx)) > - return; > - > mutex_lock(&xdna->dev_lock); > amdxdna_client_cleanup(client); [Severity: High] Since amdxdna_client_cleanup() is now deferred until file close, could this trigger a NULL pointer dereference during SVA teardown? When a device is removed, the PCI subsystem unlinks it from its IOMMU group. Later, when this file is closed, amdxdna_sva_fini() calls iommu_sva_unbind_device(). Inside the IOMMU core, iommu_detach_device_pasid() dereferences the group pointer: struct iommu_group *group = dev->iommu_group; mutex_lock(&group->mutex); If the device was already removed, dev->iommu_group would be NULL, causing a crash. Does the SVA unbind need to happen earlier during device removal? > mutex_unlock(&xdna->dev_lock); > - > - drm_dev_exit(idx); > } > > [ ... ] > @@ -436,14 +430,8 @@ static void amdxdna_remove(struct pci_dev *pdev) > amdxdna_sysfs_fini(xdna); > > mutex_lock(&xdna->dev_lock); > - client = list_first_entry_or_null(&xdna->client_list, > - struct amdxdna_client, node); > - while (client) { > - amdxdna_client_cleanup(client); > - > - client = list_first_entry_or_null(&xdna->client_list, > - struct amdxdna_client, node); > - } > + list_for_each_entry(client, &xdna->client_list, node) [Severity: High] Does this list_for_each_entry() loop have a use-after-free risk if the lock is dropped internally? The loop calls amdxdna_hwctx_remove_all(), which invokes aie2_hwctx_fini(). Inside aie2_hwctx_fini(), xdna->dev_lock is temporarily dropped to wait for hardware idle: mutex_unlock(&xdna->dev_lock); Since this patch removes the drm_dev_enter() safeguard from amdxdna_drm_close(), a concurrent file close can acquire dev_lock while it is dropped here. The concurrent close would remove the client from the list via list_del() and free the client structure. When the lock is re-acquired and list_for_each_entry() attempts to fetch the next element, won't it dereference the freed client->node.next pointer? > + amdxdna_hwctx_remove_all(client); > > xdna->dev_info->ops->fini(xdna); > mutex_unlock(&xdna->dev_lock); [Severity: High] This is a pre-existing issue, but does this function leave xdna->domain dangling, leading to a use-after-free on deferred file close? During device removal, amdxdna_remove() calls amdxdna_iommu_fini(), which frees the domain but leaves the pointer intact: iommu_domain_free(xdna->domain); If the application keeps the file descriptor open, drm_gem_object_put() will later invoke amdxdna_gem_obj_free() during cleanup. This calls amdxdna_dma_unmap_bo(): if (amdxdna_iova_on(xdna)) { size = iova_align(&xdna->iovad, abo->mem.size); iommu_unmap(xdna->domain, abo->mem.dma_addr, size); Since xdna->domain is not NULL, amdxdna_iova_on() returns true, and iommu_unmap() is called on the freed domain. Should the domain pointer be set to NULL after being freed? [Severity: High] This is a pre-existing issue, but could deferred GEM cleanup trigger a use-after-free on the notifier workqueue? Earlier in amdxdna_remove(), destroy_workqueue(xdna->notifier_wq) frees the workqueue structure. When a process retains an open file descriptor past device removal, the DRM core defers GEM cleanup to the eventual file close. When drm_gem_object_put() calls amdxdna_gem_obj_free(), it unconditionally does: flush_workqueue(xdna->notifier_wq); Since the workqueue was already destroyed and freed, won't this result in a use-after-free crash? -- Sashiko AI review ยท https://sashiko.dev/#/patchset/[email protected]?part=1
