yaxunl added a comment.

In D70172#1745998 <https://reviews.llvm.org/D70172#1745998>, @rnk wrote:

> Are we sure using both Itanium and MS C++ ABIs at the same time is really the 
> best way forward here? What are the constraints on CUDA that require the 
> Itanium ABI? I'm sure there are real reasons you can't just use the MS ABI as 
> is, but I'm curious what they are. Was there some RFC or design showing that 
> this is the right way forward?
>
> I wonder if it would be more productive to add new, more expansive 
> attributes, similar to `__attribute__((ms_struct))`, that tag class or 
> function decls as MS or Itanium C++ ABI. CUDA could then leverage this as 
> needed, and it would be much easier to construct test cases for MS/Itanium 
> interop. This is an expansion in scope, but it seems like it could be 
> generally useful, and if we're already going to enter the crazy world of 
> multiple C++ ABIs in a single TU, we might as well bite the bullet and do it 
> in a way that isn't specific to CUDA.


We are not using Itanium ABI when we do host compilation of CUDA/HIP on 
windows. During the host compilation on windows only MS C++ ABI is used.

This issue is not due to mixing MS ABI with Itanium ABI.

This issue arises from the delayed diagnostics for CUDA/HIP. Basically we do 
not want to emit certain diagnostics (e.g. error in inline assembly code) in 
`__host__` `__device__` functions to avoid clutter. We only want to emit such 
diagnostics once we are certain these functions will be emitted in IR.

To implement this, clang maintains a call graph. For each reference to a 
function, clang checks the current context. If it is evaluating context and it 
is a function, clang assumes the referenced function is callee and its context 
is the caller. Clang checks if the caller is known to be emitted (if it has 
body and external linkage). If not, clang adds this caller/callee pair to the 
call graph. If the caller is known to be emitted, clang will check if the 
callee is known to be emitted. If so, do nothing. If the callee is not known to 
be emitted, clang will eliminate it and all its callee from the call graph, and 
emits the delayed diagnostics associated with them.

You can see a caller is added to the call graph only if it is not known to be 
emitted. Therefore clang has an assert that if a callee is known to be emitted, 
it should not be in the call graph.

On windows, when vtable is known to be emitted for a class, clang does a body 
check for dtor of the class. It makes the dtor as the context, then checks the 
dtor. I think it is to emulate the situation that a deleting dtor is calling a 
normal dtor. This happens if the dtor is not defined since otherwise the dtor 
has already been checked. Since dtor is not defined yet, it is not known to be 
emitted and put into call graph. Later on, if the dtor is defined, it will be 
checked again. This time it is known to be emitted, then clang finds that it is 
in the call graph, then the assert fails.

So the issue is that clang incorrectly assume the dtor is not known to be 
emitted in the first check and put it in the call graph. To fix that, a map is 
added to Sema to tell clang that it is checking a deleting dtor which is 
supposed to be emitted even if it is not defined.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70172/new/

https://reviews.llvm.org/D70172



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to