On 2/4/26 1:56 PM, Dave Airlie wrote:
> On Thu, 5 Feb 2026 at 07:36, Timur Tabi <[email protected]> wrote:
...
>> So are you saying that some RPC commands need to have a sequence number set,
>> and some do not?
>
> I'm copying the behaviour of opengpu here,
> src/kernel/gpu/gsp/kernel_gsp.c
> if (pSequence)
> vgpu_rpc_message_header_v->sequence = *pSequence = pRpc->sequence++;
> else
> vgpu_rpc_message_header_v->sequence = 0;
>
> src/kernel/vgpu/rpc.c:_issueRpcAsync
> doesn't pass pSequence
> _issueRpcAndWait does pass it.
>
> The SetSystemInfo and SetRegistry are the two async calls in nouveau.
>
> So I'm not saying some RPC commands need to have a sequence number and
> some don't, that would be up to someone who can access GSP source
> code, I'm saying that opengpu does this and I'm copying it :-)
>
lol I love it. If only someone here had access to the GSP code and
the GSP team! :)
OK, then, I just had a very enlightening call with one of our GSP
RPC experts, in order to learn what the sequence number story really
is. The notes below are sort of Nova-centric, but it should apply
equally to Nouveau.
Let me Cc Eliot, because he was also fixing up other aspects
of GSP RPC calls on Nova, in case this overlaps.
==========================================================
Background
==========
Today there are some loose ends and inconsistencies even in the
Open RM + GSP scenario, for how sequence numbers are used. And these
are being cleaned up and fixed. In fact, I was even able to request,
and receive some nice clean behavior, which will be implemented in
GSP soon (we'll get it when we upgrade, likely sometime this year).
Today, there are 2+ sequence number spaces, one for send-receive pairs
(command + response) RPC calls, and another for GSP-initiated ("async")
messages to the CPU.
The "2+" is because there is an inconsistency (it will be fixed in
GSP), leading to the first two very early RPC calls being in yet another
unaccounted for number space. These:
NovaCore 0000:01:00.0: GSP RPC: send: seq# 0, function=GSP_SET_SYSTEM_INFO,
length=0x3f0
NovaCore 0000:01:00.0: GSP RPC: send: seq# 1, function=SET_REGISTRY, length=0xc5
...are not included in the counting, by GSP.
The GSP finally starts counting up when it gets the first "non-async"
(command/response) message, here:
NovaCore 0000:01:00.0: GSP RPC: send: seq# 2, function=GET_GSP_STATIC_INFO,
length=0x6c8
NovaCore 0000:01:00.0: GSP RPC: receive: seq# 0, function=Ok(GetGspStaticInfo),
length=0x6c8
But even here, it's not what I think we want, because we want the CPU to
get back the same seq num that it sent, for command/response pairs. But
that's not what actually happens (at least not directly).
So for now, seq numbers on Nova and Nouveau are generally "do what
you want, it will work ok". But actually, we will soon be able to
use them for
a) debugging aids,
b) detecting missing messages, and
c) recovering from "CPU sent a message, timed out waiting for
a response" cases.
==========================================================
Next steps for Nova (and maybe Nouveau, if not already done)
============================================================
a) Change debug output to print the seq number numeric space,
which is either "async message from GSP" or "command response
from GSP".
b) Put a comment in the code to indicate that GSP_SET_SYSTEM_INFO
and SET_REGISTRY do not yet participate in the incrementing seq
number system, but will in future GSP versions.
I'll send patches for Nova soon, to do the above.
==========================================================
Examples to illustrate:
=======================
Today's Nova debug logs on Blackwell:
NovaCore 0000:01:00.0: GSP RPC: send: seq# 0, function=GSP_SET_SYSTEM_INFO,
length=0x3f0
NovaCore 0000:01:00.0: GSP RPC: send: seq# 1, function=SET_REGISTRY, length=0xc5
NovaCore 0000:01:00.0: GSP RPC: receive: seq# 0,
function=Ok(GspLockdownNotice), length=0x51
NovaCore 0000:01:00.0: GSP RPC: receive: seq# 0,
function=Ok(GspLockdownNotice), length=0x51
NovaCore 0000:01:00.0: GSP RPC: receive: seq# 0, function=Ok(UcodeLibOsPrint),
length=0x68
NovaCore 0000:01:00.0: GSP RPC: receive: seq# 0, function=Ok(UcodeLibOsPrint),
length=0x70
NovaCore 0000:01:00.0: GSP RPC: receive: seq# 0, function=Ok(GspPostNoCat),
length=0x50c
NovaCore 0000:01:00.0: GSP RPC: receive: seq# 0,
function=Ok(GspLockdownNotice), length=0x51
...more of these...
NovaCore 0000:01:00.0: GSP RPC: receive: seq# 0, function=Ok(GspInitDone),
length=0x50
NovaCore 0000:01:00.0: GSP RPC: send: seq# 2, function=GET_GSP_STATIC_INFO,
length=0x6c8
NovaCore 0000:01:00.0: GSP RPC: receive: seq# 0, function=Ok(GetGspStaticInfo),
length=0x6c8
Future (after changing Nova first, and eventually upgrading GSP):
NovaCore 0000:01:00.0: GSP RPC: send: seq# 0, function=GSP_SET_SYSTEM_INFO,
length=0x3f0
NovaCore 0000:01:00.0: GSP RPC: send: seq# 1, function=SET_REGISTRY, length=0xc5
NovaCore 0000:01:00.0: GSP RPC: async received: seq# 0,
function=Ok(GspLockdownNotice), length=0x51
NovaCore 0000:01:00.0: GSP RPC: async received: seq# 1,
function=Ok(GspLockdownNotice), length=0x51
NovaCore 0000:01:00.0: GSP RPC: async received: seq# 2,
function=Ok(UcodeLibOsPrint), length=0x68
NovaCore 0000:01:00.0: GSP RPC: async received: seq# 3,
function=Ok(UcodeLibOsPrint), length=0x70
NovaCore 0000:01:00.0: GSP RPC: async received: seq# 4,
function=Ok(GspPostNoCat), length=0x50c
NovaCore 0000:01:00.0: GSP RPC: async received: seq# 5
function=Ok(GspLockdownNotice), length=0x51
...more of these...
NovaCore 0000:01:00.0: GSP RPC: async received: seq# 12,
function=Ok(GspInitDone), length=0x50
NovaCore 0000:01:00.0: GSP RPC: send: seq# 2, function=GET_GSP_STATIC_INFO,
length=0x6c8
NovaCore 0000:01:00.0: GSP RPC: response received: seq# 2,
function=Ok(GetGspStaticInfo), length=0x6c8
As you can see, the CPU side can then track the two types of messages
clearly.
thanks,
--
John Hubbard