Thank you for your contribution! Sashiko AI review found 4 potential issue(s) 
to consider:

Pre-existing issues:
- [Critical] Missing bounds check on a continuation record's length leads to 
massive integer underflow and a kernel panic.
- [High] Memory leak in `r535_gsp_rpc_push()` when an error occurs while 
handling a large payload that requires chunking.
- [High] The `r535_gsp_msgq_recv` function returns `NULL` on an invalid RPC 
length, which is incorrectly propagated as a success state.
- [High] Data corruption and uninitialized memory gaps occur when a large 
continuation record wraps around the message queue ring buffer.
--

commit 1bc0e8927764cf75b9678605ae80d0c1fd535eb5
Author: Hongling Zeng <[email protected]>
Subject: nouveau/gsp/rpc: Document RPC function return value contracts

This commit adds kernel-doc comments to document the return value semantics
of RPC functions in r535/rpc.c. It clarifies which functions return NULL,
error pointers, or both to help validate future changes.

> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/rpc.c 
> b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/rpc.c
> index 3ca3de8f43408..b4b2b5c00ab4e 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/rpc.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/rpc.c

[ ... ]

> @@ -229,6 +238,14 @@ struct r535_gsp_msg_info {
>  static void
>  r535_gsp_msg_dump(struct nvkm_gsp *gsp, struct nvfw_gsp_rpc *msg, int lvl);
>  
> +/**
> + * r535_gsp_msgq_recv_one_elem - Receive one message element from GSP queue
> + * @gsp: GSP device
> + * @info: Message queue information
> + *
> + * Return: Pointer to received buffer on success, or ERR_PTR() on error.
> + *         Never returns NULL.
> + */
>  static void *
>  r535_gsp_msgq_recv_one_elem(struct nvkm_gsp *gsp,

[Severity: Critical]
This is a pre-existing issue, but is there a missing bounds check on the
continuation record length here?

In r535_gsp_msgq_recv(), the length of a continuation record is read
directly from the hardware:

    info.gsp_rpc_len = rpc->length;
    info.continuation = true;

If a compromised or buggy GSP firmware returns a length smaller than
sizeof(struct nvfw_gsp_rpc) (32 bytes), len becomes less than 32 in
r535_gsp_msgq_recv_one_elem(). The subtraction:

    len - sizeof(struct nvfw_gsp_rpc)

will underflow to a massive positive value, causing memcpy() to attempt
an out-of-bounds copy and panic the kernel.

[Severity: High]
This is another pre-existing issue in this function, but does this code
cause data corruption when a continuation record wraps around the ring
buffer?

When copying a continuation record, the 32-byte header is stripped in the
first memcpy():

    memcpy(buf, mqe->data + sizeof(struct nvfw_gsp_rpc),
           len - sizeof(struct nvfw_gsp_rpc));

However, if the element wraps around, the second memcpy() writes to
buf + len:

    if (expected) {
            mqe = (void *)((u8 *)gsp->shm.msgq.ptr + 0x1000 + 0 * 0x1000);
            memcpy(buf + len, mqe, expected);
    }

Should this second memcpy() be writing to buf + len - sizeof(struct 
nvfw_gsp_rpc)
instead? Writing to buf + len leaves a 32-byte gap of uninitialized memory
and shifts the tail data forward, which will be overwritten by the next
continuation payload.

[ ... ]

> @@ -283,6 +300,15 @@ r535_gsp_msgq_recv_one_elem(struct nvkm_gsp *gsp,
>       return buf;
>  }
>  
> +/**
> + * r535_gsp_msgq_recv - Receive RPC message(s) from GSP queue
> + * @gsp: GSP device
> + * @gsp_rpc_len: Expected RPC length
> + * @retries: Retry counter
> + *
> + * Return: Pointer to received buffer on success, ERR_PTR() on error,
> + *         or NULL if RPC length is invalid.
> + */

[Severity: High]
This isn't a bug introduced by this patch, but does the documentation here
cement a flawed design?

If rpc->length > max_rpc_size, r535_gsp_msgq_recv() returns NULL. This
propagates through r535_gsp_msg_recv() and r535_gsp_rpc_handle_reply().
Because NULL is also the expected success return value for POLL mode,
callers like r535_gsp_rpc_poll() treat the invalid length error as a
success. Other callers expecting ERR_PTR on error might dereference NULL.

>  static void *
>  r535_gsp_msgq_recv(struct nvkm_gsp *gsp, u32 gsp_rpc_len, int *retries)

[ ... ]

> @@ -628,6 +704,17 @@ r535_gsp_rpc_get(struct nvkm_gsp *gsp, u32 fn, u32 
> payload_size)
>       return rpc->data;
>  }
>  
> +/**
> + * r535_gsp_rpc_push - Push RPC message to GSP and wait for reply
> + * @gsp: GSP device
> + * @payload: RPC payload to send
> + * @policy: Reply policy (NOWAIT, NOSEQ, RECV, or POLL)
> + * @gsp_rpc_len: Expected RPC length in the reply
> + *
> + * Return: NULL when policy is NOWAIT/NOSEQ (no reply expected) or
> + *         POLL (successful message processing), pointer to reply data
> + *         on success, or ERR_PTR() on error.
> + */
>  static void *
>  r535_gsp_rpc_push(struct nvkm_gsp *gsp, void *payload,

[Severity: High]
This isn't a bug introduced by this patch, but does this function leak memory
on the error path for large payloads?

When payload_size > max_payload_size, the original msg buffer is not passed
to r535_gsp_cmdq_push(), which would normally free it. Instead, it is manually
freed at the end of the chunking process:

    if (!IS_ERR(repv))
            kvfree(msg);

If any step fails during the chunking process, such as r535_gsp_rpc_send()
returning an ERR_PTR(), the code branches to the done label:

    if (IS_ERR(repv))
            goto done;

This bypasses the kvfree(msg) check at the end of the block, permanently
leaking the msg allocation.

-- 
Sashiko AI review ยท 
https://sashiko.dev/#/patchset/[email protected]?part=1

Reply via email to