t-tye added a comment.

In D68578#1700652 <https://reviews.llvm.org/D68578#1700652>, @tra wrote:

> In D68578#1698864 <https://reviews.llvm.org/D68578#1698864>, @t-tye wrote:
>
> > From a source language point of view, the device function comprises the 
> > code that is launched as a grid. We need this fact to be present in the 
> > symbols used. Only the device function should have a symbol name matching 
> > the mangled name of the device function.
>
>
> What do you have in mind when you use 'symbol name' here? Is that a symbol as 
> seen by linker? If that's the case, do host and device share this name space 
> on AMD GPUs? In case of CUDA, linker symbols are per-target (i.e. host and 
> each GPU have their own spaces), so they never clash, but the kernel names 
> must have identical mangled name on host and all devices, so the host can 
> refer to the device-side kernel when it needs to launch it.


We want to support a heterogeneous gdb debugger for a single source programming 
language. We would like to follow the same conventions used by compilers that 
implement other languages supported by gdb. The debugger can use symbols to 
find functions. It supports unmangling them and using the unmangled name to 
indicate the source language function it corresponds to. We would like this to 
remain true. The stub is not the kernel function, it is a helper function that 
will launch the kernel. In many ways it is acting like other trampolines. 
Therefore, it should be named as a internal helper function. The debugger can 
chose what it wants to do with it, but it does not want to be confused into 
thinking it actually IS the kernel function. If the user sets a breakpoint in 
the code of the kernel function then that breakpoint should be hit by every 
instance of the kernel that is created by the dispatch. It should not be hit by 
the code that is initiatig the dispatch. If that is what the user wanted they 
would set a breakpoint at the statement that performs the dispatch launch.

Whether the kernel is present in the CPU or GPU code is s separate concept. If 
it is present in both, then both would have the same symbol as they are both 
implementing the kernel. The debugger would set a breakpoint in both as from a 
language execution model poit of view if either piece of code executes it 
corresponds to the same source language kernel.

> 
> 
>> It the device function has both a host and device implementation then both 
>> can have the source language function name for the symbol since both 
>> actually implement the device function. If the user asks to set a breakpoint 
>> in the device function then the debugger would set in both implementations 
>> so the user is notified when the source program executes the device 
>> function, regardless of which implementation is invoked. This is similar to 
>> the debugger setting a breakpoint in a function that is inlined into 
>> multiple places: the debugger sets breeakpoints in all the inlined places so 
>> the user can tstill think of the program debugging in terms of the source 
>> language semantics.
> 
> OK. This sounds like `__host__`/`__device__` function overloads and what 
> you're saying does make sense for that.

Right. Well its not really and overload, not a request to have instances of the 
kernel avaiable for either the CPU or GPU to execute. They are the same 
function, not different overloads.

> 
> 
>> In contrast, the stub is effectively part of the implementation of actually 
>> launching the device function. It should have a distinct name.
> 
> I'm not sure how the requirement of distinct name follows from the fact that 
> the stub is the host-side part of the device-side kernel? To me it looks like 
> an argument for them to have the same name so it's clear that they are both 
> part of the same function as written in the source.
> 
> The don't have to be different. CUDA (and HIP) does not allow overloading of 
> kernels, so the stub and the kernel can have identical names as in the 
> example of `__host__` and `__device__` overloads you've described above, only 
> now it's `__host__` stub + `__global__` kernel itself, instead of two 
> user-implemented functions. Debugger, of course, will need to know about that 
> to pick the stub or kernel as the breakpoint location, but that appears 
> doable.

As mentioned, the stub is not the host side part of the device side kernel. The 
stub is a means to launch the kernel. That launching could happen on the device 
(device enqueue), or on the host. The kernel itself could execute on the device 
or the host. There is the act of launching the kernel (the function call 
statement if you will), and the kernel instances that come into existence (the 
threads created to execute the body of the kernel according to the launch 
bounds presented at the launch statement).

The user may want to set a breakpoint at the launch statement, or in the body 
of the kernel. The language execution model treats those separately. The 
standard debugger expects the symbols to reflect the language constructs. Hence 
wanting the launch stub (which is compiler generated and not user written) to 
be distinct from the kernel body. If the compiler decided not to use a launch 
stub function (perhaps it is launching a kernel that will execute on the CPU 
and so does not need a helper function) then that is its choice. It is 
desirable that the debugger does not have to know about the choices made by the 
compiler. It simply wants to know that a symbol that appears to be a user 
source language construct is in fact exactly that. It does not have to do any 
compiler specific filtering.

When the debugger has DWARF available, it can use that to get a more accurate 
picture, but gdb does have the ability to fall back on just symbols, and it is 
that functionality we would like to preserve in the same manner as is done for 
other languages.

>> The debugger can still be used to set a breakpoint in it, or to step into 
>> it. But that should be done in terms of the stub name. If the debugger wants 
>> to support source language specific intelligence it can provide a helper 
>> library that understands the stub names. This helper library (similar to the 
>> thread helper library) can be used by the debugger to present a cleaner 
>> language view to the user. In fact OpenMP has also done this and provides a 
>> helper library called OMPD that can be used by tools such as a debugger to 
>> hide OpenMP trampoline functions etc.
> 
> Do I understand it correctly that giving the stub distinct name would 
> effectively get it out of the way when a breakpoint is set on the kernel? 
> I.e. it's essentially a work around the fact that debugger may not have 
> convenient way to specify "set breakpoint on this name in device code only". 
> Perhaps it would make sense to prove this ability as it sounds quite useful. 
> I.e I may want to set breakpoint on all inlined host/device functions, but 
> only on device side. That would be handy.

It is not really a work around. It is making the symbols reflect the reality of 
the source language program. The debugger can then simply trust that 
information and use it as gdb does for other languages.

Features such only break on this inlining of a function may be useful, but gdb 
does not currenty support that. Similarly, it does not support breakpoints 
based on the architecture. That could be simulated by having a conditional 
breakpoint that continues if the current thread architecture does not equal the 
chosen architecture. If such a feature were widely used it could be accelerated 
by adding architecture conditional breakpoints. I can add that to our list of 
suggestions.

> What happens if the stub and the kernel do have identical names?

The stub is compiler generated so should never have a name that can collide 
with a user name.

> My understanding, based on your comments above is that debugger does know 
> about host and device 'spaces' and that it can find pointers to both host and 
> device functions and set appropriate breakpoints for both. In this case it 
> would normally attempt to set breakpoint on both the stub and the kernel as 
> it would in case of `__host__`/`__device__` overloads you've described above. 
> In case of stub/kernel, we would want the breakpoint set only on the kernel 
> itself. Given that debugger does have ability to tell host and device 
> functions/symbols apart, the difficulty is really in being able to tell a 
> real host function from the stub, so we can skip it.
> 
> Is that indeed what we want/need? Is there something else?

As mentioned above. The desire is to have the compiler generate information in 
a standard way so the debugger can consume it in a standard way. If the user 
wants to set a breakpoint in the kernel, the compiler information should only 
lead the debugger to setting a breakpoint in the kernel, not in some other 
function that is used in the implementation of launching kernels. The user 
expects a kernel breakpoint to only be hit by the threads that execute the 
kernel instances created by the launch as that is how the language is defined. 
The debugger simply keeps track of what code objects are loaded, and what 
symbols they contain. It does not need to know the distinction between host and 
device code to implement the basic debugger functionality.

> Does debugger know that device-side function is a kernel? In case of CUDA, 
> the kernels are distinct from regular device-side functions. I don't know 
> whether that's the case for AMDGPU.
>  If debugger can tell that particular device function is a kernel, that can 
> be used to infer that the matching host-side symbol is a stub and skip 
> setting a breakpoint on it.
>  If that does not work, debugger presumably has access to the mangled symbols 
> for the potential breakpoint locations. The stub currently has distinct 
> `.stub` suffix. This can also be used to tell it apart from a regular 
> `__host__` function.

The debugger does not have to care if the symbol is for a kernel or a function, 
it will simply plonk a breakpoint in the code that corresponds to the symbol 
and report when it is hit. If the symbols follow the conventions used by other 
languages then the debugger does not have to do anything special to support a 
source language that happens to be executing on multiple devices.

> I do not see how changing the source-level name for the stub is going to 
> change things in principle. It's just yet another way to disambiguate a real 
> `__host__` function from a `host` stub we generate for the kernel.
>  Is there anything else about the stubs that requires them to have a name 
> different from the kernel?

The stub is not a source level function, it is a compiler generated function, 
and the desire is that it be named so as not to conflict with actual source 
level constructs as is done for other compiler generated entries.

> 
> 
>> I am a little unclear what this patch is doing as it is mentioned that the 
>> mangled name has a _stub in it.
> 
> Currently the mangled name has .stub suffix which is discarded during 
> unmangling, so unmangled names for the stub and the kernel end up being 
> identical. I'm trying to figure out why is it a problem to be fixed in the 
> compiler.

My understanding was that an earlier review rejected adding the suffix to the 
mangled name as it broke unmangling. It also does not seem the right thing to 
do as it does not follow the convention for oter compiler generate symbols.

>> My understanding is that the intention was to create a distinct unmangled 
>> name for the stub, and then mangle it so that the resulting symbol was a 
>> legal mangled name. It sounded like this was the preferred approach, and 
>> makes sense to me based on my current understanding. Am I understanding this 
>> correctly?
> 
> This patch proposes changing the source-level name for the stub. 
> Unfortunately the way it attempt to implement it is by doing the renaming 
> during mangling phase itself. This appears to be the wrong place to change 
> source-level name.

What do you think is the right place to do it?

> Before figuring out what would be the right thing to do here, I want to 
> understand why we're doing it. I appreciate your description of what drives 
> this requirement. I think I have petter idea of it now, but I still have some 
> questions. Please bear with me.

That makes complete sense. The desire is to have the debugger treat 
heterogeneous single source debugging in the same way as traditional CPU 
debugging. That the user experience is basically the same. By following the 
same conventions used for the CPU in the GPU, and implementing similar runtime 
controls, it allows a common debugger code base to support both with minimal 
change and ensure a consistent user experience. We would like to avoid adding 
special treatment to support the GPU in the debugger if following the existing 
conventions/standards will allow the existing code to simply work.

Hopefully the above responses help describe the motivation for this. If not let 
me know and thanks for taking the time to review.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68578/new/

https://reviews.llvm.org/D68578



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to