Thank you for your contribution! Sashiko AI review found 4 potential issue(s) to consider:
Pre-existing issues: - [Critical] Missing bounds check on a continuation record's length leads to massive integer underflow and a kernel panic. - [High] Memory leak in `r535_gsp_rpc_push()` when an error occurs while handling a large payload that requires chunking. - [High] The `r535_gsp_msgq_recv` function returns `NULL` on an invalid RPC length, which is incorrectly propagated as a success state. - [High] Data corruption and uninitialized memory gaps occur when a large continuation record wraps around the message queue ring buffer. -- commit 1bc0e8927764cf75b9678605ae80d0c1fd535eb5 Author: Hongling Zeng <[email protected]> Subject: nouveau/gsp/rpc: Document RPC function return value contracts This commit adds kernel-doc comments to document the return value semantics of RPC functions in r535/rpc.c. It clarifies which functions return NULL, error pointers, or both to help validate future changes. > diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/rpc.c > b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/rpc.c > index 3ca3de8f43408..b4b2b5c00ab4e 100644 > --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/rpc.c > +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/rpc.c [ ... ] > @@ -229,6 +238,14 @@ struct r535_gsp_msg_info { > static void > r535_gsp_msg_dump(struct nvkm_gsp *gsp, struct nvfw_gsp_rpc *msg, int lvl); > > +/** > + * r535_gsp_msgq_recv_one_elem - Receive one message element from GSP queue > + * @gsp: GSP device > + * @info: Message queue information > + * > + * Return: Pointer to received buffer on success, or ERR_PTR() on error. > + * Never returns NULL. > + */ > static void * > r535_gsp_msgq_recv_one_elem(struct nvkm_gsp *gsp, [Severity: Critical] This is a pre-existing issue, but is there a missing bounds check on the continuation record length here? In r535_gsp_msgq_recv(), the length of a continuation record is read directly from the hardware: info.gsp_rpc_len = rpc->length; info.continuation = true; If a compromised or buggy GSP firmware returns a length smaller than sizeof(struct nvfw_gsp_rpc) (32 bytes), len becomes less than 32 in r535_gsp_msgq_recv_one_elem(). The subtraction: len - sizeof(struct nvfw_gsp_rpc) will underflow to a massive positive value, causing memcpy() to attempt an out-of-bounds copy and panic the kernel. [Severity: High] This is another pre-existing issue in this function, but does this code cause data corruption when a continuation record wraps around the ring buffer? When copying a continuation record, the 32-byte header is stripped in the first memcpy(): memcpy(buf, mqe->data + sizeof(struct nvfw_gsp_rpc), len - sizeof(struct nvfw_gsp_rpc)); However, if the element wraps around, the second memcpy() writes to buf + len: if (expected) { mqe = (void *)((u8 *)gsp->shm.msgq.ptr + 0x1000 + 0 * 0x1000); memcpy(buf + len, mqe, expected); } Should this second memcpy() be writing to buf + len - sizeof(struct nvfw_gsp_rpc) instead? Writing to buf + len leaves a 32-byte gap of uninitialized memory and shifts the tail data forward, which will be overwritten by the next continuation payload. [ ... ] > @@ -283,6 +300,15 @@ r535_gsp_msgq_recv_one_elem(struct nvkm_gsp *gsp, > return buf; > } > > +/** > + * r535_gsp_msgq_recv - Receive RPC message(s) from GSP queue > + * @gsp: GSP device > + * @gsp_rpc_len: Expected RPC length > + * @retries: Retry counter > + * > + * Return: Pointer to received buffer on success, ERR_PTR() on error, > + * or NULL if RPC length is invalid. > + */ [Severity: High] This isn't a bug introduced by this patch, but does the documentation here cement a flawed design? If rpc->length > max_rpc_size, r535_gsp_msgq_recv() returns NULL. This propagates through r535_gsp_msg_recv() and r535_gsp_rpc_handle_reply(). Because NULL is also the expected success return value for POLL mode, callers like r535_gsp_rpc_poll() treat the invalid length error as a success. Other callers expecting ERR_PTR on error might dereference NULL. > static void * > r535_gsp_msgq_recv(struct nvkm_gsp *gsp, u32 gsp_rpc_len, int *retries) [ ... ] > @@ -628,6 +704,17 @@ r535_gsp_rpc_get(struct nvkm_gsp *gsp, u32 fn, u32 > payload_size) > return rpc->data; > } > > +/** > + * r535_gsp_rpc_push - Push RPC message to GSP and wait for reply > + * @gsp: GSP device > + * @payload: RPC payload to send > + * @policy: Reply policy (NOWAIT, NOSEQ, RECV, or POLL) > + * @gsp_rpc_len: Expected RPC length in the reply > + * > + * Return: NULL when policy is NOWAIT/NOSEQ (no reply expected) or > + * POLL (successful message processing), pointer to reply data > + * on success, or ERR_PTR() on error. > + */ > static void * > r535_gsp_rpc_push(struct nvkm_gsp *gsp, void *payload, [Severity: High] This isn't a bug introduced by this patch, but does this function leak memory on the error path for large payloads? When payload_size > max_payload_size, the original msg buffer is not passed to r535_gsp_cmdq_push(), which would normally free it. Instead, it is manually freed at the end of the chunking process: if (!IS_ERR(repv)) kvfree(msg); If any step fails during the chunking process, such as r535_gsp_rpc_send() returning an ERR_PTR(), the code branches to the done label: if (IS_ERR(repv)) goto done; This bypasses the kvfree(msg) check at the end of the block, permanently leaking the msg allocation. -- Sashiko AI review ยท https://sashiko.dev/#/patchset/[email protected]?part=1
