Thank you for your contribution! Sashiko AI review found 4 potential issue(s) 
to consider:

Pre-existing issues:
- [Critical] The device removal path forcibly frees `amdxdna_client` structures 
while their corresponding file descriptors remain open, causing systematic 
Use-After-Free (UAF) vulnerabilities in IOCTLs, mmap, and closure paths.
- [High] AIE4 hardware context creation unconditionally sets the PASID as valid 
and sends an invalid PASID value to the firmware on non-IOMMU/carveout systems.
- [High] In `amdxdna_drm_open()`, the newly allocated `client` structure is 
added to the global `xdna->client_list` before its initialization is complete, 
causing a TOCTOU race condition.
- [High] The return value of init_srcu_struct() is ignored, potentially leading 
to a NULL pointer dereference on its per-CPU arrays under memory pressure.
--

commit 38ba86620597fb9bb1ac8b2c38ecbef54add17c2
Author: Lizhi Hou <[email protected]>

accel/amdxdna: Clear sva pointer after unbind

Add client->sva = NULL after the unbind makes it consistent with how
amdxdna_sva_fini() already clears the pointer after unbinding. The
IS_ERR_OR_NULL guard in sva_fini will then correctly skip the second
unbind.

> diff --git a/drivers/accel/amdxdna/amdxdna_pci_drv.c 
> b/drivers/accel/amdxdna/amdxdna_pci_drv.c
> index c677293c1ae7f..70410c5e8fd47 100644
> --- a/drivers/accel/amdxdna/amdxdna_pci_drv.c
> +++ b/drivers/accel/amdxdna/amdxdna_pci_drv.c
> @@ -87,6 +87,7 @@ static int amdxdna_sva_init(struct amdxdna_client *client)
>       client->pasid = iommu_sva_get_pasid(client->sva);
>       if (client->pasid == IOMMU_PASID_INVALID) {
>               iommu_sva_unbind_device(client->sva);
> +             client->sva = NULL;

[Severity: High]
This is a pre-existing issue, but I noticed that when the pasid is
IOMMU_PASID_INVALID (which is a permitted state for physical address/carveout
mode), aie4_hwctx_create() might send an invalid pasid to the firmware.

drivers/accel/amdxdna/aie4_ctx.c:aie4_hwctx_create() {
    ...
    req.pasid = FIELD_PREP(AIE4_MSG_PASID, client->pasid) |
                FIELD_PREP(AIE4_MSG_PASID_VLD, 1);
    ...
}

Does this unconditionally mask client->pasid and set AIE4_MSG_PASID_VLD to 1?
Sending an all-1s pasid with the valid bit set might lead to iommu faults or
hardware hangs.

AIE2 seems to check amdxdna_pasid_on() and passes 0 if the pasid is invalid.
Should AIE4 do something similar?

[Severity: Critical]
This isn't a bug introduced by this patch, but the device removal path
appears to forcibly free amdxdna_client structures while their file
descriptors might still be open.

drivers/accel/amdxdna/amdxdna_pci_drv.c:amdxdna_remove() {
    ...
    while (client) {
        amdxdna_client_cleanup(client);
        client = list_first_entry_or_null(&xdna->client_list, ...
    ...
}

Since drm_dev_unplug() does not force the closure of open files, can vfs
or drm file descriptors remain open after amdxdna_client_cleanup() frees
the client memory?

If the file descriptor is later closed by the process exiting,
amdxdna_drm_close() accesses client->pid before calling drm_dev_enter():

drivers/accel/amdxdna/amdxdna_pci_drv.c:amdxdna_drm_close() {
    ...
    struct amdxdna_client *client = filp->driver_priv;
    ...
    XDNA_DBG(xdna, "closing pid %d", client->pid);
    ...
}

Similarly, ioctls like amdxdna_drm_get_info_ioctl() dereference the freed
client structure without drm_dev_enter() protection. Can this lead to a
use-after-free?

[Severity: High]
This is a pre-existing issue, but there might be a race condition in
amdxdna_drm_open() during client initialization.

drivers/accel/amdxdna/amdxdna_pci_drv.c:amdxdna_drm_open() {
    ...
    list_add_tail(&client->node, &xdna->client_list);
    mutex_unlock(&xdna->dev_lock);

    filp->driver_priv = client;
    client->filp = filp;
    ...
}

Since accel_open() avoids calling drm_dev_enter(), could amdxdna_remove()
execute concurrently and find this partially initialized client in the list?

If amdxdna_remove() frees the client before amdxdna_drm_open() finishes,
could the assignment to filp->driver_priv result in a dangling pointer?

[Severity: High]
This isn't a bug introduced by this patch, but the return value of
init_srcu_struct() seems to be ignored.

drivers/accel/amdxdna/amdxdna_pci_drv.c:amdxdna_drm_open() {
    ...
    init_srcu_struct(&client->hwctx_srcu);
    ...
}

Since init_srcu_struct() allocates per-CPU memory and can return -ENOMEM,
what happens if the allocation fails? Will the srcu_struct be left
partially initialized?

If subsequent ioctls invoke srcu_read_lock(&client->hwctx_srcu), could
this dereference unallocated pointers and cause a kernel panic?

>               XDNA_ERR(xdna, "SVA get pasid failed");
>               return -ENODEV;
>       }

-- 
Sashiko AI review ยท 
https://sashiko.dev/#/patchset/[email protected]?part=1

Reply via email to