rpc: Cleanup incorrect IS_ERR_OR_NULL in rpc.c

sashiko-bot Mon, 22 Jun 2026 01:27:43 -0700

Thank you for your contribution! Sashiko AI review found 5 potential issue(s) 
to consider:


Pre-existing issues:
- [Critical] Missing upper and lower bounds checks on the untrusted 
`rpc->length` field from GSP messages causes severe integer underflows and heap 
buffer overflows.
- [High] Returning NULL on message length error in `r535_gsp_msgq_recv()` leads 
to false success reporting.
- [High] Early returns on error paths in GSP message reception fail to advance 
the read pointer, causing a permanently deadlocked queue.
- [High] Memory leak of large GSP RPC messages on error paths in 
`r535_gsp_rpc_push`.
- [High] Missing read memory barrier (`smp_rmb`) when polling the GSP message 
queue write pointer allows out-of-order reads of stale payload data.
--

commit b9953ccb9eee21566520494158c392c95237ec17
Author: Hongling Zeng <[email protected]>

nouveau/gsp/rpc: Cleanup incorrect IS_ERR_OR_NULL in rpc.c

This commit cleans up incorrect IS_ERR_OR_NULL() usage for functions
that never return NULL, such as r535_gsp_msgq_peek() and
r535_gsp_msgq_recv_one_elem(). It updates them to be checked with
IS_ERR() instead.

> diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/rpc.c 
> b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/rpc.c
> index b4b2b5c00ab4e..5b89c8835a2e4 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/rpc.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/gsp/rm/r535/rpc.c
> @@ -350,7 +350,7 @@ r535_gsp_msgq_recv(struct nvkm_gsp *gsp, u32 gsp_rpc_len, 
> int *retries)
>               u32 size;
>  
>               rpc = r535_gsp_msgq_peek(gsp, sizeof(*rpc), info.retries);

[Severity: High]
This isn't a bug introduced by this patch, but earlier in r535_gsp_msgq_recv(),
if the message length is invalid, it returns NULL:

        if (WARN_ON(rpc->length > max_rpc_size))
                return NULL;

Since callers like nvkm_gsp_rm_alloc_wr() use IS_ERR() to check for failures,
does this cause the caller to incorrectly treat the operation as a success
because IS_ERR(NULL) evaluates to false?

[Severity: High]
This is a pre-existing issue, but do the early returns on error paths in
GSP message reception fail to advance the read pointer?

For example, in r535_gsp_msgq_recv():

        if (WARN_ON(rpc->length > max_rpc_size))
                return NULL;

And in r535_gsp_msgq_recv_one_elem():

        if (rpc->function != NV_VGPU_MSG_FUNCTION_CONTINUATION_RECORD) {
                nvkm_error(&gsp->subdev,
                           "Not a continuation of a large RPC\n");
                r535_gsp_msg_dump(gsp, rpc, NV_DBG_ERROR);
                return ERR_PTR(-EIO);
        }

Since they return without advancing *gsp->msgq.rptr, will subsequent attempts
to process the queue peek the exact same malformed message and hit the same
error, causing a permanently deadlocked queue?

> -             if (IS_ERR_OR_NULL(rpc)) {
> +             if (IS_ERR(rpc)) {
>                       kvfree(buf);
>                       return rpc;
>               }
> @@ -359,7 +359,7 @@ r535_gsp_msgq_recv(struct nvkm_gsp *gsp, u32 gsp_rpc_len, 
> int *retries)
>               info.continuation = true;
>  
>               rpc = r535_gsp_msgq_recv_one_elem(gsp, &info);

[Severity: Critical]
This is a pre-existing issue, but does this code lack upper and lower bounds
checks on the untrusted rpc->length field from GSP messages?

If a compromised GSP provides a small length for a continuation record, it
looks like len - sizeof(struct nvfw_gsp_rpc) in r535_gsp_msgq_recv_one_elem()
could underflow:

        if (info->continuation)
                memcpy(buf, mqe->data + sizeof(struct nvfw_gsp_rpc),
                       len - sizeof(struct nvfw_gsp_rpc));

Additionally, a few lines below this hunk in r535_gsp_msgq_recv():

                size = info.gsp_rpc_len - sizeof(*rpc);
                expected -= size;
                info.gsp_rpc_buf += size;

Could this overflow the kvmalloc-allocated buffer since there is no upper
bound check ensuring rpc->length does not exceed expected?

> -             if (IS_ERR_OR_NULL(rpc)) {
> +             if (IS_ERR(rpc)) {
>                       kvfree(buf);
>                       return rpc;
>               }

[ ... ]

> @@ -499,7 +499,7 @@ r535_gsp_msg_recv(struct nvkm_gsp *gsp, int fn, u32 
> gsp_rpc_len)
>  
>  retry:
>       rpc = r535_gsp_msgq_peek(gsp, sizeof(*rpc), &retries);
> -     if (IS_ERR_OR_NULL(rpc))
> +     if (IS_ERR(rpc))
>               return rpc;
>  
>       rpc = r535_gsp_msgq_recv(gsp, gsp_rpc_len, &retries);

[Severity: High]
This is a pre-existing issue, but does r535_gsp_rpc_push() leak large GSP RPC
messages on error paths?

If the payload size exceeds max_payload_size, the message is chunked and the
original msg must be freed manually. However, the error paths seem to bypass
this:

r535_gsp_rpc_push() {
...
done:
        mutex_unlock(&gsp->cmdq.mutex);
        return repv;
}

If any intermediate chunk allocation or sending step fails and triggers the
goto done path, does this bypass the kvfree(msg) call and leak the large
message allocation?

[Severity: High]
This isn't a bug introduced by this patch, but does r535_gsp_msgq_wait()
miss a read memory barrier when polling the GSP message queue write pointer?

r535_gsp_msgq_wait() {
...
        if (WARN_ON(!*ptime))
                return -ETIMEDOUT;

        return used;
}

Since there is no smp_rmb() between checking the write pointer and reading
the payload from shared memory, could the CPU speculatively execute the
payload reads before the pointer read completes on weakly ordered architectures,
observing stale memory?

-- 
Sashiko AI review · 
https://sashiko.dev/#/patchset/[email protected]?part=2

Re: [PATCH v5 2/5] nouveau/gsp/rpc: Cleanup incorrect IS_ERR_OR_NULL in rpc.c

Reply via email to