On Tue, 2025-08-26 at 17:03 -0600, Rodrigo Siqueira wrote:
> On 08/26, Alex Deucher wrote:
> > On Mon, Aug 25, 2025 at 5:18 PM Timur Kristóf
> > <timur.kris...@gmail.com> wrote:
> > > 
> > > On Mon, 2025-08-25 at 13:06 -0400, Alex Deucher wrote:
> > > > On Mon, Aug 25, 2025 at 12:39 PM Timur Kristóf
> > > > <timur.kris...@gmail.com> wrote:
> > > > > 
> > > > > On Mon, 2025-08-25 at 12:31 -0400, Alex Deucher wrote:
> > > > > > On Mon, Aug 25, 2025 at 12:19 PM Timur Kristóf
> > > > > > <timur.kris...@gmail.com> wrote:
> > > > > > > 
> > > > > > > On Mon, 2025-08-25 at 11:38 -0400, Alex Deucher wrote:
> > > > > > > > On Sun, Aug 24, 2025 at 7:43 PM Rodrigo Siqueira
> > > > > > > > <sique...@igalia.com> wrote:
> > > > > > > > 
> > > > > > > > 
> > > > > > > > > +
> > > > > > > > > +First of all, note that the GC can have multiple
> > > > > > > > > SEs,
> > > > > > > > > depending on
> > > > > > > > > the specific
> > > > > > > > > +GPU/APU, and each SE has multiple Compute Units
> > > > > > > > > (CU). From
> > > > > > > > > the
> > > > > > > > > diagram, you can
> > > > > > > > > +see that CUs have a block named Schedulers. The
> > > > > > > > > reason the
> > > > > > > > > name is
> > > > > > > > > in plural is
> > > > > > > > > +because this hardware block is a combination of
> > > > > > > > > different
> > > > > > > > > micro-
> > > > > > > > > schedules: CP,
> > > > > > > > > +CPF, CPC, and CPG.
> > > > > > > > 
> > > > > > > > CP is not really in the same category as CPF, CPC,
> > > > > > > > CPG.  CP
> > > > > > > > is
> > > > > > > > the
> > > > > > > > front end to the GC block and contains a number of
> > > > > > > > micro
> > > > > > > > controllers
> > > > > > > > which run firmware which software interacts with.  CPF,
> > > > > > > > CPG,
> > > > > > > > and
> > > > > > > > CPC
> > > > > > > > are just hardware implementation details.
> > > > > > > 
> > > > > > > Can you please suggest an edit that explains these
> > > > > > > better?
> > > > > > > 
> > > > > > > I'm sorry to say, I thought I understood it but after
> > > > > > > reading
> > > > > > > your
> > > > > > > reply now I feel I don't.
> > > > > > 
> > > > > > I would say something like:
> > > > > > 
> > > > > > The CP (Command Processor) is the front end to the GC
> > > > > > hardware.
> > > > > > It
> > > > > > provides microcontrollers which manage command queues which
> > > > > > are
> > > > > > used
> > > > > > to feed jobs to the GFX and compute hardware.
> > > > > 
> > > > > Sounds good. What do you think, Siquiera?
> > > > > 
> > > > > > 
> > > > > > > 
> > > > > > > > 
> > > > > > > > > +
> > > > > > > > >  The component that acts as the front end between the
> > > > > > > > > CPU
> > > > > > > > > and
> > > > > > > > > the
> > > > > > > > > GPU is called
> > > > > > > > > -the Command Processor (CP). This component is
> > > > > > > > > responsible
> > > > > > > > > for
> > > > > > > > > providing greater
> > > > > > > > > +CP (Command Processor). This component is
> > > > > > > > > responsible for
> > > > > > > > > providing greater
> > > > > > > > >  flexibility to the GC since CP makes it possible to
> > > > > > > > > program
> > > > > > > > > various aspects of
> > > > > > > > >  the GPU pipeline. CP also coordinates the
> > > > > > > > > communication
> > > > > > > > > between
> > > > > > > > > the CPU and GPU
> > > > > > > > >  via a mechanism named **Ring Buffers**, where the
> > > > > > > > > CPU
> > > > > > > > > appends
> > > > > > > > > information to
> > > > > > > > > -the buffer while the GPU removes operations. It is
> > > > > > > > > relevant to
> > > > > > > > > highlight that a
> > > > > > > > > -CPU can add a pointer to the Ring Buffer that points
> > > > > > > > > to
> > > > > > > > > another
> > > > > > > > > region of
> > > > > > > > > -memory outside the Ring Buffer, and CP can handle
> > > > > > > > > it; this
> > > > > > > > > mechanism is called
> > > > > > > > > -**Indirect Buffer (IB)**. CP receives and parses the
> > > > > > > > > Command
> > > > > > > > > Streams (CS), and
> > > > > > > > > -writes the operations to the correct hardware
> > > > > > > > > blocks.
> > > > > > > > > +the buffer while the GPU removes operations.
> > > > > > > > > Finally, CP
> > > > > > > > > is
> > > > > > > > > also
> > > > > > > > > responsible
> > > > > > > > > +for handling Indirect Buffers (IB).
> > > > > > > > > +
> > > > > > > > > +After CP completes the first set of processing,
> > > > > > > > > which
> > > > > > > > > includes
> > > > > > > > > separate command
> > > > > > > > > +packets specific to GFX and Compute, other blocks
> > > > > > > > > step in.
> > > > > > > > > To
> > > > > > > > > handle commands
> > > > > > > > > +for the compute block, CPC (Command Processor
> > > > > > > > > Command)
> > > > > > > > > takes
> > > > > > > > > over,
> > > > > > > > > and for
> > > > > > > > > +handling Graphics operations, the CPG (Command
> > > > > > > > > Processor
> > > > > > > > > Graphics)
> > > > > > > > > takes
> > > > > > > > > +action. Another essential block to ensure the
> > > > > > > > > optimal
> > > > > > > > > utilization
> > > > > > > > > of CPC and
> > > > > > > > > +CPG is the CPF (Command Processor Fetcher), which
> > > > > > > > > helps
> > > > > > > > > these
> > > > > > > > > blocks to be
> > > > > > > > > +constantly fed. Note that CPG contains the PFP (Pre-
> > > > > > > > > Fetch
> > > > > > > > > Parser),
> > > > > > > > > ME
> > > > > > > > > +(MicroEngine), and CE (Constant Engine) in the case
> > > > > > > > > of
> > > > > > > > > chips
> > > > > > > > > that
> > > > > > > > > support it.
> > > > > > > > > +CPC contains MEC (MicroEngine Compute), and CPF is
> > > > > > > > > another
> > > > > > > > > hardware block that
> > > > > > > > > +provides services to CPG and CPC.
> > > > > > > > 
> > > > > > > > I'm not sure how much value this provides to the
> > > > > > > > average
> > > > > > > > developer.
> > > > > > > > These are sort of implementation details of the
> > > > > > > > hardware.  In
> > > > > > > > general
> > > > > > > > the driver doesn't really interact with the individual
> > > > > > > > hardware
> > > > > > > > blocks
> > > > > > > > and they may not stay consistent over time.
> > > > > > > > 
> > > > > > > > Alex
> > > > > > > 
> > > > > > > Not sure what you mean by "the average developer", but I
> > > > > > > think
> > > > > > > this
> > > > > > > is
> > > > > > > very useful knowledge to anyone who wants to contribute
> > > > > > > to
> > > > > > > amdgpu,
> > > > > > > specifically to the parts that have anything to do with
> > > > > > > GFX or
> > > > > > > compute.
> > > > > > > 
> > > > > > > If you're worried that it may not stay consistent over
> > > > > > > time, I
> > > > > > > think
> > > > > > > the glossary entries could be edited to mention which GPU
> > > > > > > generation(s)
> > > > > > > they apply to.
> > > > > > > 
> > > > > > > As-is the code is full of 3-letter abbreviations that are
> > > > > > > never
> > > > > > > expanded or explained anywhere, which represent various
> > > > > > > hardware
> > > > > > > units
> > > > > > > (or microcontrollers, or blocks, or whatever they may
> > > > > > > be).
> > > > > > > Without
> > > > > > > knowing what these are and how they interact, it's
> > > > > > > difficult to
> > > > > > > understand what the code is doing any why, or even why
> > > > > > > some
> > > > > > > parts
> > > > > > > are
> > > > > > > necessary.
> > > > > > > 
> > > > > > > To make matters worse, the latest public documentation
> > > > > > > that
> > > > > > > tries
> > > > > > > to
> > > > > > > explain any of this is from 2012. So I think it's a good
> > > > > > > idea
> > > > > > > to
> > > > > > > collect all of this information so that newcomers to the
> > > > > > > kernel
> > > > > > > driver
> > > > > > > such as myself have a chance.
> > > > > > 
> > > > > > The driver/developers don't interact with CPF, CPC, CPG
> > > > > > directly.
> > > > > > They just happen to be arbitrary sub-blocks of the CP.  I'm
> > > > > > concerned
> > > > > > that adding a lot of stuff about them will just lead to
> > > > > > confusion.
> > > > > 
> > > > > I think they are worth a sentence or two each in the
> > > > > glossary.
> > > > > 
> > > > > When trying to diagnose problems (eg. GPU hangs), we often
> > > > > need to
> > > > > look
> > > > > at various HW registers (eg. GRBM_STATUS), which refer to the
> > > > > above
> > > > > sub-blocks. It is then hard to see what is going on without
> > > > > knowing
> > > > > what these are. In turn, that makes it hard to come up with
> > > > > an
> > > > > understanding that can explain what is happening on the HW.
> > > > > 
> > > > 
> > > > I think that's fine.  I just don't want to put too much
> > > > emphasis on
> > > > them since they are more of an implementation detail within the
> > > > CP.
> > > > They aren't quite the same as the other blocks that make up the
> > > > GC
> > > > pipeline from a driver or debugging standpoint.
> > > 
> > > I see your point.
> > > 
> > > If you want to deemphasize these, how would you feel about
> > > mentioning
> > > them under the CP instead of giving them their own glossary
> > > entry?
> > > 
> > 
> > Sure.  I think that is fine.  How about something like:
> > 
> > For reference, internally the CP consists of several sub-blocks
> > (CPC -
> > CP compute, CPG - CP graphics, and CPF - CP fetcher).  Some of
> > these
> > acronyms appear in register names, but this is more of an
> > implementation detail and not something that directly impacts
> > driver
> > programming or debugging directly.
> 
> 
> Hi Alex, Timur,
> 
> I attempted to incorporate all the points from the discussion into
> the
> version of the text below. The main points are:
> 
> 1. Added a link to the CU image.
> 1. Removed the reference to CP from the micro-schedules part.
> 3. Rewrite the last paragraph just to mention components like CPG,
> CPC,
> etc.
> 
> Let me know what you think.
> 
> New version:
> 
> .. kernel-figure:: cu.svg
> ===> https://people.igalia.com/siqueira/kernel-doc-imgs/cu.svg
> 

I think this is mixed up and doesn't look right to me.

First, WGP (workgroup processor) is only relevant on GFX10+ (that is,
Navi 10 or newer). CU (compute unit) is something that all GCN and RDNA
GPUs have. These are already well documented publicly, and I don't
think we need another diagram for them.

For reference, you can find a diagram of a GCN CU on page 5 here:
https://www.amd.com/content/dam/amd/en/documents/radeon-tech-docs/instruction-set-architectures/vega-7nm-shader-instruction-set-architecture.pdf
And a diagram of an RDNA WGP on page 6 here:
https://www.amd.com/content/dam/amd/en/documents/radeon-tech-docs/instruction-set-architectures/rdna4-instruction-set-architecture.pdf

The above diagrams do mention commands processors although not in
detail. As you can see, the command processors are not part of a CU.
Rather, they can generate work for the various CUs/WGPs.

As far as I understand, there is only one of each CP per queue (or per
pipe, I am somewhat mixed up myself about the differences between a
queue and a pipe). For example, each compute queue has its own MEC, and
any MEC can launch work on any CU.

Please correct me if I'm wrong about it.

Timur


> First of all, note that the GC can have multiple SEs, depending on
> the specific
> GPU/APU, and each SE has multiple Compute Units (CU). From the
> diagram, you can
> see that CUs have a block named Schedulers. The reason the name is in
> plural is
> because this hardware block is a combination of different micro-
> schedules: CPF,
> CPC, and CPG.
> 
> The component that acts as the front end between the CPU and the GPU
> is called
> CP (Command Processor). This component is responsible for providing
> greater
> flexibility to the GC since CP makes it possible to program various
> aspects of
> the GPU pipeline. CP also coordinates the communication between the
> CPU and GPU
> via a mechanism named **Ring Buffers**, where the CPU appends
> information to
> the buffer while the GPU removes operations. Finally, CP is also
> responsible
> for handling Indirect Buffers (IB).
> 
> After CP completes the first set of processing, which includes
> separate command
> packets specific to GFX and Compute, other blocks step in. For
> reference,
> internally the CP consists of several sub-blocks (CPC - CP compute,
> CPG - CP
> graphics, and CPF - CP fetcher).  Some of these acronyms appear in
> register
> names, but this is more of an implementation detail and not something
> that
> directly impacts driver programming or debugging directly.
> 
> Thanks
> 
> > 
> > Alex
> > 
> > 
> > > > 
> > > > 
> > > > > > 
> > > > > > Documenting the micro controllers which run the firmwares
> > > > > > makes
> > > > > > sense
> > > > > > as those are how the driver interacts with the CP block.
> > > > > > 
> > > > > > CE/PFP/ME - Microcontrollers which run the firmware that
> > > > > > provides
> > > > > > the
> > > > > > graphics command queues that the driver interacts with.
> > > > > > MEC - Microcontrollers which run the firmware that provides
> > > > > > the
> > > > > > compute command queues that the driver interacts with.
> > > > > > MES - Microcontrollers which run the firmware that provides
> > > > > > the
> > > > > > command queues that the driver uses to manage graphics and
> > > > > > compute
> > > > > > command queues.
> > > > > 
> > > > > I agree and I think most (all?) of these are already in the
> > > > > glossary.
> > > > > If not, they should be definitely added.
> > > > > 
> > > > > Thanks & best regards,
> > > > > Timur

Reply via email to