Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
Pre-existing issues: - [Critical] The device removal path forcibly frees `amdxdna_client` structures while their corresponding file descriptors remain open, causing systematic Use-After-Free (UAF) vulnerabilities in IOCTLs, mmap, and closure paths. - [High] AIE4 hardware context creation unconditionally sets the PASID as valid and sends an invalid PASID value to the firmware on non-IOMMU/carveout systems. - [High] In `amdxdna_drm_open()`, the newly allocated `client` structure is added to the global `xdna->client_list` before its initialization is complete, causing a TOCTOU race condition. - [High] The return value of init_srcu_struct() is ignored, potentially leading to a NULL pointer dereference on its per-CPU arrays under memory pressure. -- commit 38ba86620597fb9bb1ac8b2c38ecbef54add17c2 Author: Lizhi Hou <[email protected]> accel/amdxdna: Clear sva pointer after unbind Add client->sva = NULL after the unbind makes it consistent with how amdxdna_sva_fini() already clears the pointer after unbinding. The IS_ERR_OR_NULL guard in sva_fini will then correctly skip the second unbind. > diff --git a/drivers/accel/amdxdna/amdxdna_pci_drv.c > b/drivers/accel/amdxdna/amdxdna_pci_drv.c > index c677293c1ae7f..70410c5e8fd47 100644 > --- a/drivers/accel/amdxdna/amdxdna_pci_drv.c > +++ b/drivers/accel/amdxdna/amdxdna_pci_drv.c > @@ -87,6 +87,7 @@ static int amdxdna_sva_init(struct amdxdna_client *client) > client->pasid = iommu_sva_get_pasid(client->sva); > if (client->pasid == IOMMU_PASID_INVALID) { > iommu_sva_unbind_device(client->sva); > + client->sva = NULL; [Severity: High] This is a pre-existing issue, but I noticed that when the pasid is IOMMU_PASID_INVALID (which is a permitted state for physical address/carveout mode), aie4_hwctx_create() might send an invalid pasid to the firmware. drivers/accel/amdxdna/aie4_ctx.c:aie4_hwctx_create() { ... req.pasid = FIELD_PREP(AIE4_MSG_PASID, client->pasid) | FIELD_PREP(AIE4_MSG_PASID_VLD, 1); ... } Does this unconditionally mask client->pasid and set AIE4_MSG_PASID_VLD to 1? Sending an all-1s pasid with the valid bit set might lead to iommu faults or hardware hangs. AIE2 seems to check amdxdna_pasid_on() and passes 0 if the pasid is invalid. Should AIE4 do something similar? [Severity: Critical] This isn't a bug introduced by this patch, but the device removal path appears to forcibly free amdxdna_client structures while their file descriptors might still be open. drivers/accel/amdxdna/amdxdna_pci_drv.c:amdxdna_remove() { ... while (client) { amdxdna_client_cleanup(client); client = list_first_entry_or_null(&xdna->client_list, ... ... } Since drm_dev_unplug() does not force the closure of open files, can vfs or drm file descriptors remain open after amdxdna_client_cleanup() frees the client memory? If the file descriptor is later closed by the process exiting, amdxdna_drm_close() accesses client->pid before calling drm_dev_enter(): drivers/accel/amdxdna/amdxdna_pci_drv.c:amdxdna_drm_close() { ... struct amdxdna_client *client = filp->driver_priv; ... XDNA_DBG(xdna, "closing pid %d", client->pid); ... } Similarly, ioctls like amdxdna_drm_get_info_ioctl() dereference the freed client structure without drm_dev_enter() protection. Can this lead to a use-after-free? [Severity: High] This is a pre-existing issue, but there might be a race condition in amdxdna_drm_open() during client initialization. drivers/accel/amdxdna/amdxdna_pci_drv.c:amdxdna_drm_open() { ... list_add_tail(&client->node, &xdna->client_list); mutex_unlock(&xdna->dev_lock); filp->driver_priv = client; client->filp = filp; ... } Since accel_open() avoids calling drm_dev_enter(), could amdxdna_remove() execute concurrently and find this partially initialized client in the list? If amdxdna_remove() frees the client before amdxdna_drm_open() finishes, could the assignment to filp->driver_priv result in a dangling pointer? [Severity: High] This isn't a bug introduced by this patch, but the return value of init_srcu_struct() seems to be ignored. drivers/accel/amdxdna/amdxdna_pci_drv.c:amdxdna_drm_open() { ... init_srcu_struct(&client->hwctx_srcu); ... } Since init_srcu_struct() allocates per-CPU memory and can return -ENOMEM, what happens if the allocation fails? Will the srcu_struct be left partially initialized? If subsequent ioctls invoke srcu_read_lock(&client->hwctx_srcu), could this dereference unallocated pointers and cause a kernel panic? > XDNA_ERR(xdna, "SVA get pasid failed"); > return -ENODEV; > } -- Sashiko AI review ยท https://sashiko.dev/#/patchset/[email protected]?part=1
