Thank you for your contribution! Sashiko AI review found 5 potential issue(s) to consider:
Pre-existing issues: - [Critical] Missing upper and lower bounds checks on the untrusted `rpc->length` field from GSP messages causes severe integer underflows and heap buffer overflows. - [High] Returning NULL on message length error in `r535_gsp_msgq_recv()` leads to false success reporting. - [High] Early returns on error paths in GSP message reception fail to advance the read pointer, causing a permanently deadlocked queue. - [High] Memory leak of large GSP RPC messages on error paths in `r535_gsp_rpc_push`. - [High] Missing read memory barrier (`smp_rmb`) when polling the GSP message queue write pointer allows out-of-order reads of stale payload data. -- commit b9953ccb9eee21566520494158c392c95237ec17 Author: Hongling Zeng <[email protected]> nouveau/gsp/rpc: Cleanup incorrect IS_ERR_OR_NULL in rpc.c This commit cleans up incorrect IS_ERR_OR_NULL() usage for functions that never return NULL, such as r535_gsp_msgq_peek() and r535_gsp_msgq_recv_one_elem(). It updates them to be checked with IS_ERR() instead. > diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/rpc.c > b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/rpc.c > index b4b2b5c00ab4e..5b89c8835a2e4 100644 > --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/rpc.c > +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/rpc.c > @@ -350,7 +350,7 @@ r535_gsp_msgq_recv(struct nvkm_gsp *gsp, u32 gsp_rpc_len, > int *retries) > u32 size; > > rpc = r535_gsp_msgq_peek(gsp, sizeof(*rpc), info.retries); [Severity: High] This isn't a bug introduced by this patch, but earlier in r535_gsp_msgq_recv(), if the message length is invalid, it returns NULL: if (WARN_ON(rpc->length > max_rpc_size)) return NULL; Since callers like nvkm_gsp_rm_alloc_wr() use IS_ERR() to check for failures, does this cause the caller to incorrectly treat the operation as a success because IS_ERR(NULL) evaluates to false? [Severity: High] This is a pre-existing issue, but do the early returns on error paths in GSP message reception fail to advance the read pointer? For example, in r535_gsp_msgq_recv(): if (WARN_ON(rpc->length > max_rpc_size)) return NULL; And in r535_gsp_msgq_recv_one_elem(): if (rpc->function != NV_VGPU_MSG_FUNCTION_CONTINUATION_RECORD) { nvkm_error(&gsp->subdev, "Not a continuation of a large RPC\n"); r535_gsp_msg_dump(gsp, rpc, NV_DBG_ERROR); return ERR_PTR(-EIO); } Since they return without advancing *gsp->msgq.rptr, will subsequent attempts to process the queue peek the exact same malformed message and hit the same error, causing a permanently deadlocked queue? > - if (IS_ERR_OR_NULL(rpc)) { > + if (IS_ERR(rpc)) { > kvfree(buf); > return rpc; > } > @@ -359,7 +359,7 @@ r535_gsp_msgq_recv(struct nvkm_gsp *gsp, u32 gsp_rpc_len, > int *retries) > info.continuation = true; > > rpc = r535_gsp_msgq_recv_one_elem(gsp, &info); [Severity: Critical] This is a pre-existing issue, but does this code lack upper and lower bounds checks on the untrusted rpc->length field from GSP messages? If a compromised GSP provides a small length for a continuation record, it looks like len - sizeof(struct nvfw_gsp_rpc) in r535_gsp_msgq_recv_one_elem() could underflow: if (info->continuation) memcpy(buf, mqe->data + sizeof(struct nvfw_gsp_rpc), len - sizeof(struct nvfw_gsp_rpc)); Additionally, a few lines below this hunk in r535_gsp_msgq_recv(): size = info.gsp_rpc_len - sizeof(*rpc); expected -= size; info.gsp_rpc_buf += size; Could this overflow the kvmalloc-allocated buffer since there is no upper bound check ensuring rpc->length does not exceed expected? > - if (IS_ERR_OR_NULL(rpc)) { > + if (IS_ERR(rpc)) { > kvfree(buf); > return rpc; > } [ ... ] > @@ -499,7 +499,7 @@ r535_gsp_msg_recv(struct nvkm_gsp *gsp, int fn, u32 > gsp_rpc_len) > > retry: > rpc = r535_gsp_msgq_peek(gsp, sizeof(*rpc), &retries); > - if (IS_ERR_OR_NULL(rpc)) > + if (IS_ERR(rpc)) > return rpc; > > rpc = r535_gsp_msgq_recv(gsp, gsp_rpc_len, &retries); [Severity: High] This is a pre-existing issue, but does r535_gsp_rpc_push() leak large GSP RPC messages on error paths? If the payload size exceeds max_payload_size, the message is chunked and the original msg must be freed manually. However, the error paths seem to bypass this: r535_gsp_rpc_push() { ... done: mutex_unlock(&gsp->cmdq.mutex); return repv; } If any intermediate chunk allocation or sending step fails and triggers the goto done path, does this bypass the kvfree(msg) call and leak the large message allocation? [Severity: High] This isn't a bug introduced by this patch, but does r535_gsp_msgq_wait() miss a read memory barrier when polling the GSP message queue write pointer? r535_gsp_msgq_wait() { ... if (WARN_ON(!*ptime)) return -ETIMEDOUT; return used; } Since there is no smp_rmb() between checking the write pointer and reading the payload from shared memory, could the CPU speculatively execute the payload reads before the pointer read completes on weakly ordered architectures, observing stale memory? -- Sashiko AI review ยท https://sashiko.dev/#/patchset/[email protected]?part=2
