On Tue, 2025-08-26 at 17:03 -0600, Rodrigo Siqueira wrote: > On 08/26, Alex Deucher wrote: > > On Mon, Aug 25, 2025 at 5:18 PM Timur Kristóf > > <timur.kris...@gmail.com> wrote: > > > > > > On Mon, 2025-08-25 at 13:06 -0400, Alex Deucher wrote: > > > > On Mon, Aug 25, 2025 at 12:39 PM Timur Kristóf > > > > <timur.kris...@gmail.com> wrote: > > > > > > > > > > On Mon, 2025-08-25 at 12:31 -0400, Alex Deucher wrote: > > > > > > On Mon, Aug 25, 2025 at 12:19 PM Timur Kristóf > > > > > > <timur.kris...@gmail.com> wrote: > > > > > > > > > > > > > > On Mon, 2025-08-25 at 11:38 -0400, Alex Deucher wrote: > > > > > > > > On Sun, Aug 24, 2025 at 7:43 PM Rodrigo Siqueira > > > > > > > > <sique...@igalia.com> wrote: > > > > > > > > > > > > > > > > > > > > > > > > > + > > > > > > > > > +First of all, note that the GC can have multiple > > > > > > > > > SEs, > > > > > > > > > depending on > > > > > > > > > the specific > > > > > > > > > +GPU/APU, and each SE has multiple Compute Units > > > > > > > > > (CU). From > > > > > > > > > the > > > > > > > > > diagram, you can > > > > > > > > > +see that CUs have a block named Schedulers. The > > > > > > > > > reason the > > > > > > > > > name is > > > > > > > > > in plural is > > > > > > > > > +because this hardware block is a combination of > > > > > > > > > different > > > > > > > > > micro- > > > > > > > > > schedules: CP, > > > > > > > > > +CPF, CPC, and CPG. > > > > > > > > > > > > > > > > CP is not really in the same category as CPF, CPC, > > > > > > > > CPG. CP > > > > > > > > is > > > > > > > > the > > > > > > > > front end to the GC block and contains a number of > > > > > > > > micro > > > > > > > > controllers > > > > > > > > which run firmware which software interacts with. CPF, > > > > > > > > CPG, > > > > > > > > and > > > > > > > > CPC > > > > > > > > are just hardware implementation details. > > > > > > > > > > > > > > Can you please suggest an edit that explains these > > > > > > > better? > > > > > > > > > > > > > > I'm sorry to say, I thought I understood it but after > > > > > > > reading > > > > > > > your > > > > > > > reply now I feel I don't. > > > > > > > > > > > > I would say something like: > > > > > > > > > > > > The CP (Command Processor) is the front end to the GC > > > > > > hardware. > > > > > > It > > > > > > provides microcontrollers which manage command queues which > > > > > > are > > > > > > used > > > > > > to feed jobs to the GFX and compute hardware. > > > > > > > > > > Sounds good. What do you think, Siquiera? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > + > > > > > > > > > The component that acts as the front end between the > > > > > > > > > CPU > > > > > > > > > and > > > > > > > > > the > > > > > > > > > GPU is called > > > > > > > > > -the Command Processor (CP). This component is > > > > > > > > > responsible > > > > > > > > > for > > > > > > > > > providing greater > > > > > > > > > +CP (Command Processor). This component is > > > > > > > > > responsible for > > > > > > > > > providing greater > > > > > > > > > flexibility to the GC since CP makes it possible to > > > > > > > > > program > > > > > > > > > various aspects of > > > > > > > > > the GPU pipeline. CP also coordinates the > > > > > > > > > communication > > > > > > > > > between > > > > > > > > > the CPU and GPU > > > > > > > > > via a mechanism named **Ring Buffers**, where the > > > > > > > > > CPU > > > > > > > > > appends > > > > > > > > > information to > > > > > > > > > -the buffer while the GPU removes operations. It is > > > > > > > > > relevant to > > > > > > > > > highlight that a > > > > > > > > > -CPU can add a pointer to the Ring Buffer that points > > > > > > > > > to > > > > > > > > > another > > > > > > > > > region of > > > > > > > > > -memory outside the Ring Buffer, and CP can handle > > > > > > > > > it; this > > > > > > > > > mechanism is called > > > > > > > > > -**Indirect Buffer (IB)**. CP receives and parses the > > > > > > > > > Command > > > > > > > > > Streams (CS), and > > > > > > > > > -writes the operations to the correct hardware > > > > > > > > > blocks. > > > > > > > > > +the buffer while the GPU removes operations. > > > > > > > > > Finally, CP > > > > > > > > > is > > > > > > > > > also > > > > > > > > > responsible > > > > > > > > > +for handling Indirect Buffers (IB). > > > > > > > > > + > > > > > > > > > +After CP completes the first set of processing, > > > > > > > > > which > > > > > > > > > includes > > > > > > > > > separate command > > > > > > > > > +packets specific to GFX and Compute, other blocks > > > > > > > > > step in. > > > > > > > > > To > > > > > > > > > handle commands > > > > > > > > > +for the compute block, CPC (Command Processor > > > > > > > > > Command) > > > > > > > > > takes > > > > > > > > > over, > > > > > > > > > and for > > > > > > > > > +handling Graphics operations, the CPG (Command > > > > > > > > > Processor > > > > > > > > > Graphics) > > > > > > > > > takes > > > > > > > > > +action. Another essential block to ensure the > > > > > > > > > optimal > > > > > > > > > utilization > > > > > > > > > of CPC and > > > > > > > > > +CPG is the CPF (Command Processor Fetcher), which > > > > > > > > > helps > > > > > > > > > these > > > > > > > > > blocks to be > > > > > > > > > +constantly fed. Note that CPG contains the PFP (Pre- > > > > > > > > > Fetch > > > > > > > > > Parser), > > > > > > > > > ME > > > > > > > > > +(MicroEngine), and CE (Constant Engine) in the case > > > > > > > > > of > > > > > > > > > chips > > > > > > > > > that > > > > > > > > > support it. > > > > > > > > > +CPC contains MEC (MicroEngine Compute), and CPF is > > > > > > > > > another > > > > > > > > > hardware block that > > > > > > > > > +provides services to CPG and CPC. > > > > > > > > > > > > > > > > I'm not sure how much value this provides to the > > > > > > > > average > > > > > > > > developer. > > > > > > > > These are sort of implementation details of the > > > > > > > > hardware. In > > > > > > > > general > > > > > > > > the driver doesn't really interact with the individual > > > > > > > > hardware > > > > > > > > blocks > > > > > > > > and they may not stay consistent over time. > > > > > > > > > > > > > > > > Alex > > > > > > > > > > > > > > Not sure what you mean by "the average developer", but I > > > > > > > think > > > > > > > this > > > > > > > is > > > > > > > very useful knowledge to anyone who wants to contribute > > > > > > > to > > > > > > > amdgpu, > > > > > > > specifically to the parts that have anything to do with > > > > > > > GFX or > > > > > > > compute. > > > > > > > > > > > > > > If you're worried that it may not stay consistent over > > > > > > > time, I > > > > > > > think > > > > > > > the glossary entries could be edited to mention which GPU > > > > > > > generation(s) > > > > > > > they apply to. > > > > > > > > > > > > > > As-is the code is full of 3-letter abbreviations that are > > > > > > > never > > > > > > > expanded or explained anywhere, which represent various > > > > > > > hardware > > > > > > > units > > > > > > > (or microcontrollers, or blocks, or whatever they may > > > > > > > be). > > > > > > > Without > > > > > > > knowing what these are and how they interact, it's > > > > > > > difficult to > > > > > > > understand what the code is doing any why, or even why > > > > > > > some > > > > > > > parts > > > > > > > are > > > > > > > necessary. > > > > > > > > > > > > > > To make matters worse, the latest public documentation > > > > > > > that > > > > > > > tries > > > > > > > to > > > > > > > explain any of this is from 2012. So I think it's a good > > > > > > > idea > > > > > > > to > > > > > > > collect all of this information so that newcomers to the > > > > > > > kernel > > > > > > > driver > > > > > > > such as myself have a chance. > > > > > > > > > > > > The driver/developers don't interact with CPF, CPC, CPG > > > > > > directly. > > > > > > They just happen to be arbitrary sub-blocks of the CP. I'm > > > > > > concerned > > > > > > that adding a lot of stuff about them will just lead to > > > > > > confusion. > > > > > > > > > > I think they are worth a sentence or two each in the > > > > > glossary. > > > > > > > > > > When trying to diagnose problems (eg. GPU hangs), we often > > > > > need to > > > > > look > > > > > at various HW registers (eg. GRBM_STATUS), which refer to the > > > > > above > > > > > sub-blocks. It is then hard to see what is going on without > > > > > knowing > > > > > what these are. In turn, that makes it hard to come up with > > > > > an > > > > > understanding that can explain what is happening on the HW. > > > > > > > > > > > > > I think that's fine. I just don't want to put too much > > > > emphasis on > > > > them since they are more of an implementation detail within the > > > > CP. > > > > They aren't quite the same as the other blocks that make up the > > > > GC > > > > pipeline from a driver or debugging standpoint. > > > > > > I see your point. > > > > > > If you want to deemphasize these, how would you feel about > > > mentioning > > > them under the CP instead of giving them their own glossary > > > entry? > > > > > > > Sure. I think that is fine. How about something like: > > > > For reference, internally the CP consists of several sub-blocks > > (CPC - > > CP compute, CPG - CP graphics, and CPF - CP fetcher). Some of > > these > > acronyms appear in register names, but this is more of an > > implementation detail and not something that directly impacts > > driver > > programming or debugging directly. > > > Hi Alex, Timur, > > I attempted to incorporate all the points from the discussion into > the > version of the text below. The main points are: > > 1. Added a link to the CU image. > 1. Removed the reference to CP from the micro-schedules part. > 3. Rewrite the last paragraph just to mention components like CPG, > CPC, > etc. > > Let me know what you think. > > New version: > > .. kernel-figure:: cu.svg > ===> https://people.igalia.com/siqueira/kernel-doc-imgs/cu.svg >
I think this is mixed up and doesn't look right to me. First, WGP (workgroup processor) is only relevant on GFX10+ (that is, Navi 10 or newer). CU (compute unit) is something that all GCN and RDNA GPUs have. These are already well documented publicly, and I don't think we need another diagram for them. For reference, you can find a diagram of a GCN CU on page 5 here: https://www.amd.com/content/dam/amd/en/documents/radeon-tech-docs/instruction-set-architectures/vega-7nm-shader-instruction-set-architecture.pdf And a diagram of an RDNA WGP on page 6 here: https://www.amd.com/content/dam/amd/en/documents/radeon-tech-docs/instruction-set-architectures/rdna4-instruction-set-architecture.pdf The above diagrams do mention commands processors although not in detail. As you can see, the command processors are not part of a CU. Rather, they can generate work for the various CUs/WGPs. As far as I understand, there is only one of each CP per queue (or per pipe, I am somewhat mixed up myself about the differences between a queue and a pipe). For example, each compute queue has its own MEC, and any MEC can launch work on any CU. Please correct me if I'm wrong about it. Timur > First of all, note that the GC can have multiple SEs, depending on > the specific > GPU/APU, and each SE has multiple Compute Units (CU). From the > diagram, you can > see that CUs have a block named Schedulers. The reason the name is in > plural is > because this hardware block is a combination of different micro- > schedules: CPF, > CPC, and CPG. > > The component that acts as the front end between the CPU and the GPU > is called > CP (Command Processor). This component is responsible for providing > greater > flexibility to the GC since CP makes it possible to program various > aspects of > the GPU pipeline. CP also coordinates the communication between the > CPU and GPU > via a mechanism named **Ring Buffers**, where the CPU appends > information to > the buffer while the GPU removes operations. Finally, CP is also > responsible > for handling Indirect Buffers (IB). > > After CP completes the first set of processing, which includes > separate command > packets specific to GFX and Compute, other blocks step in. For > reference, > internally the CP consists of several sub-blocks (CPC - CP compute, > CPG - CP > graphics, and CPF - CP fetcher). Some of these acronyms appear in > register > names, but this is more of an implementation detail and not something > that > directly impacts driver programming or debugging directly. > > Thanks > > > > > Alex > > > > > > > > > > > > > > > > > > > > > > > > Documenting the micro controllers which run the firmwares > > > > > > makes > > > > > > sense > > > > > > as those are how the driver interacts with the CP block. > > > > > > > > > > > > CE/PFP/ME - Microcontrollers which run the firmware that > > > > > > provides > > > > > > the > > > > > > graphics command queues that the driver interacts with. > > > > > > MEC - Microcontrollers which run the firmware that provides > > > > > > the > > > > > > compute command queues that the driver interacts with. > > > > > > MES - Microcontrollers which run the firmware that provides > > > > > > the > > > > > > command queues that the driver uses to manage graphics and > > > > > > compute > > > > > > command queues. > > > > > > > > > > I agree and I think most (all?) of these are already in the > > > > > glossary. > > > > > If not, they should be definitely added. > > > > > > > > > > Thanks & best regards, > > > > > Timur