Author: Yaxun (Sam) Liu
Date: 2026-03-13T23:15:17-04:00
New Revision: cc4ff7fe4f71d18e666018dc270a43726bb3457a

URL: 
https://github.com/llvm/llvm-project/commit/cc4ff7fe4f71d18e666018dc270a43726bb3457a
DIFF: 
https://github.com/llvm/llvm-project/commit/cc4ff7fe4f71d18e666018dc270a43726bb3457a.diff

LOG: [CUDA/HIP][SYCL] Deduplicate deferred diagnostics across multiple callers 
(#185926)

[CUDA/HIP][SYCL] Deduplicate deferred diagnostics across multiple
callers

Deferred diagnostics for a function were emitted once per caller that
forced the function into device context. When multiple device functions
called the same host-device function containing errors, the diagnostics
were repeated for each caller, producing noisy duplicate output.

Change the deferred diagnostic emission to a two-pass approach:
1. During the call graph walk, collect callers in DeviceKnownEmittedFns
   (now storing multiple callers per function) and mark functions that
   need diagnostics, but don't emit yet.
2. After the walk completes, emit diagnostics once per function with
   all callers listed as notes.

Call chain notes now use "called by" for the first caller in each chain
and "which is called by" for subsequent callers in the chain, making it
easy to distinguish separate call chains.

Also add documentation for deferred diagnostics and the concept of
HD-promoted functions to the HIP and CUDA docs.

Fixes: https://github.com/llvm/llvm-project/issues/180638

Added: 
    clang/test/SemaCUDA/deferred-diags-dedup.cu

Modified: 
    clang/docs/HIPSupport.rst
    clang/include/clang/Basic/DiagnosticSemaKinds.td
    clang/include/clang/Sema/SemaCUDA.h
    clang/lib/Sema/Sema.cpp
    clang/test/OpenMP/nvptx_unsupported_type_messages.cpp
    clang/test/SemaCUDA/deferred-diags-limit.cu
    clang/test/SemaCUDA/deferred-diags.cu
    clang/test/SemaSYCL/sycl-kernel-entry-point-attr-device-odr-use.cpp
    llvm/docs/CompileCudaWithLLVM.rst

Removed: 
    


################################################################################
diff  --git a/clang/docs/HIPSupport.rst b/clang/docs/HIPSupport.rst
index c2a91a3062bc3..e7f00202c34e7 100644
--- a/clang/docs/HIPSupport.rst
+++ b/clang/docs/HIPSupport.rst
@@ -484,6 +484,90 @@ non-constexpr function, which is by default a host 
function.
 Users can override the inferred host and device attributes of default
 destructors by adding explicit host and device attributes to them.
 
+Deferred Diagnostics
+====================
+
+In HIP (and CUDA), a ``__host__ __device__`` function can be called from both
+host and device code. Certain operations may not be valid on one side (e.g.,
+calling a host-only function from device code, or referencing a device-only
+function from host code). However, a ``__host__ __device__`` function
+containing such operations is only ill-formed if the function is actually
+reachable from a caller where the operation cannot be emitted.
+
+Clang handles this through *deferred diagnostics*: errors and warnings in
+``__host__ __device__`` functions are recorded during parsing but not emitted
+immediately. They are only emitted when a function whose linkage guarantees
+IR emission (e.g., a kernel, or an externally visible ``__host__`` or
+``__device__`` function) directly or indirectly calls the
+``__host__ __device__`` function containing the invalid operation.
+
+HD-Promoted Functions
+---------------------
+
+An *HD-promoted function* is a function that is implicitly or explicitly
+``__host__ __device__`` and is only emitted because a caller requires it —
+it does not have standalone linkage that guarantees emission. HD-promoted
+functions are the primary source of deferred diagnostics.
+
+Common examples of HD-promoted functions:
+
+- Lambdas without explicit ``__host__`` or ``__device__`` attributes
+- ``inline __host__ __device__`` helper functions
+- ``__host__ __device__`` template instantiations
+
+When an HD-promoted function contains operations that are not valid on the
+caller's side, clang emits the deferred diagnostics along with notes showing
+how the function was reached.
+
+Example
+^^^^^^^
+
+.. code-block:: c++
+
+   __host__ void host_only();
+
+   // This lambda is implicitly __host__ __device__. It is HD-promoted
+   // when called from a __device__ function.
+   __device__ auto lambda = [] {
+     host_only();  // error: only emitted if lambda is used from device code
+   };
+
+   __device__ void df1() {
+     lambda();  // triggers deferred diagnostic for lambda
+   }
+
+   __device__ void df2() {
+     lambda();  // same lambda, same error — not duplicated
+   }
+
+Clang emits the error once and lists all device callers:
+
+.. code-block:: text
+
+   error: reference to __host__ function 'host_only' in __host__ __device__ 
function
+     note: 'host_only' declared here
+     note: called by 'df1'
+     note: called by 'df2'
+
+Call Chain Notes
+^^^^^^^^^^^^^^^^
+
+When an HD-promoted function is reached through a chain of intermediate
+functions, clang shows the full call chain. The first note in each chain uses
+"called by" and subsequent notes use "which is called by":
+
+.. code-block:: text
+
+   error: reference to __host__ function 'host_only' in __host__ __device__ 
function
+     note: called by 'helper1'
+     note: which is called by 'device_func1'
+     note: called by 'helper2'
+     note: which is called by 'device_func2'
+
+Each "called by" starts a new chain, and "which is called by" continues it.
+This makes it clear which device function ultimately forced the code into
+device context.
+
 C++ Standard Parallelism Offload Support: Compiler And Runtime
 ==============================================================
 

diff  --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td 
b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 0c25eb2443d5e..fae63cc0ba139 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -9692,6 +9692,7 @@ def err_deleted_inherited_ctor_use : Error<
   "constructor inherited by %0 from base class %1 is implicitly deleted">;
 
 def note_called_by : Note<"called by %0">;
+def note_which_is_called_by : Note<"which is called by %0">;
 def err_kern_type_not_void_return : Error<
   "kernel function type %0 must have void return type">;
 def err_kern_is_nonstatic_method : Error<

diff  --git a/clang/include/clang/Sema/SemaCUDA.h 
b/clang/include/clang/Sema/SemaCUDA.h
index dbb4290f5d149..2907fe4099d8d 100644
--- a/clang/include/clang/Sema/SemaCUDA.h
+++ b/clang/include/clang/Sema/SemaCUDA.h
@@ -72,13 +72,14 @@ class SemaCUDA : public SemaBase {
   /// same deferred diag twice.
   llvm::DenseSet<FunctionDeclAndLoc> LocsWithCUDACallDiags;
 
-  /// An inverse call graph, mapping known-emitted functions to one of their
+  /// An inverse call graph, mapping known-emitted functions to their
   /// known-emitted callers (plus the location of the call).
   ///
   /// Functions that we can tell a priori must be emitted aren't added to this
-  /// map.
+  /// map. A function may have multiple callers that force it into device
+  /// context, so we store all of them to produce complete diagnostics.
   llvm::DenseMap</* Callee = */ CanonicalDeclPtr<const FunctionDecl>,
-                 /* Caller = */ FunctionDeclAndLoc>
+                 /* Callers = */ llvm::SmallVector<FunctionDeclAndLoc, 1>>
       DeviceKnownEmittedFns;
 
   /// Creates a SemaDiagnosticBuilder that emits the diagnostic if the current

diff  --git a/clang/lib/Sema/Sema.cpp b/clang/lib/Sema/Sema.cpp
index 3065b5e1e66d3..8b1d0398cf65d 100644
--- a/clang/lib/Sema/Sema.cpp
+++ b/clang/lib/Sema/Sema.cpp
@@ -73,6 +73,7 @@
 #include "clang/Sema/TypoCorrection.h"
 #include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/SetVector.h"
 #include "llvm/ADT/SmallPtrSet.h"
 #include "llvm/Support/TimeProfiler.h"
 #include <optional>
@@ -1815,17 +1816,27 @@ bool Sema::hasUncompilableErrorOccurred() const {
 }
 
 // Print notes showing how we can reach FD starting from an a priori
-// known-callable function.
+// known-callable function. When a function has multiple callers, emit
+// each call chain separately. The first note in each chain uses
+// "called by" and subsequent notes use "which is called by".
 static void emitCallStackNotes(Sema &S, const FunctionDecl *FD) {
   auto FnIt = S.CUDA().DeviceKnownEmittedFns.find(FD);
-  while (FnIt != S.CUDA().DeviceKnownEmittedFns.end()) {
-    // Respect error limit.
+  if (FnIt == S.CUDA().DeviceKnownEmittedFns.end())
+    return;
+
+  for (const auto &CallerInfo : FnIt->second) {
     if (S.Diags.hasFatalErrorOccurred())
       return;
-    DiagnosticBuilder Builder(
-        S.Diags.Report(FnIt->second.Loc, diag::note_called_by));
-    Builder << FnIt->second.FD;
-    FnIt = S.CUDA().DeviceKnownEmittedFns.find(FnIt->second.FD);
+    S.Diags.Report(CallerInfo.Loc, diag::note_called_by) << CallerInfo.FD;
+    // Walk up the rest of the chain using "which is called by".
+    auto NextIt = S.CUDA().DeviceKnownEmittedFns.find(CallerInfo.FD);
+    while (NextIt != S.CUDA().DeviceKnownEmittedFns.end()) {
+      if (S.Diags.hasFatalErrorOccurred())
+        return;
+      const auto &Next = NextIt->second.front();
+      S.Diags.Report(Next.Loc, diag::note_which_is_called_by) << Next.FD;
+      NextIt = S.CUDA().DeviceKnownEmittedFns.find(Next.FD);
+    }
   }
 }
 
@@ -1875,6 +1886,11 @@ class DeferredDiagnosticsEmitter
   // 
diff erent depending on whether it is in OpenMP device context.
   llvm::SmallPtrSet<CanonicalDeclPtr<Decl>, 4> DoneMap[2];
 
+  // Functions that need their deferred diagnostics emitted. Collected
+  // during the graph walk and emitted afterwards so that all callers
+  // are known when producing call chain notes.
+  llvm::SetVector<CanonicalDeclPtr<const FunctionDecl>> FnsToEmit;
+
   // Emission state of the root node of the current use graph.
   bool ShouldEmitRootNode;
 
@@ -1969,13 +1985,16 @@ class DeferredDiagnosticsEmitter
     if (Caller && S.LangOpts.OpenMP && UsePath.size() == 1 &&
         (ShouldEmitRootNode || InOMPDeviceContext))
       S.OpenMP().finalizeOpenMPDelayedAnalysis(Caller, FD, Loc);
-    if (Caller)
-      S.CUDA().DeviceKnownEmittedFns[FD] = {Caller, Loc};
-    // Always emit deferred diagnostics for the direct users. This does not
-    // lead to explosion of diagnostics since each user is visited at most
-    // twice.
+    if (Caller) {
+      auto &Callers = S.CUDA().DeviceKnownEmittedFns[FD];
+      CanonicalDeclPtr<const FunctionDecl> CanonCaller(Caller);
+      if (llvm::none_of(Callers, [CanonCaller](const auto &C) {
+            return C.FD == CanonCaller;
+          }))
+        Callers.push_back({Caller, Loc});
+    }
     if (ShouldEmitRootNode || InOMPDeviceContext)
-      emitDeferredDiags(FD, Caller);
+      FnsToEmit.insert(FD);
     // Do not revisit a function if the function body has been completely
     // visited before.
     if (!Done.insert(FD).second)
@@ -2000,15 +2019,12 @@ class DeferredDiagnosticsEmitter
       checkVar(cast<VarDecl>(D));
   }
 
-  // Emit any deferred diagnostics for FD
-  void emitDeferredDiags(FunctionDecl *FD, bool ShowCallStack) {
+  void emitDeferredDiags(const FunctionDecl *FD) {
     auto It = S.DeviceDeferredDiags.find(FD);
     if (It == S.DeviceDeferredDiags.end())
       return;
     bool HasWarningOrError = false;
-    bool FirstDiag = true;
     for (PartialDiagnosticAt &PDAt : It->second) {
-      // Respect error limit.
       if (S.Diags.hasFatalErrorOccurred())
         return;
       const SourceLocation &Loc = PDAt.first;
@@ -2020,13 +2036,14 @@ class DeferredDiagnosticsEmitter
         DiagnosticBuilder Builder(S.Diags.Report(Loc, PD.getDiagID()));
         PD.Emit(Builder);
       }
-      // Emit the note on the first diagnostic in case too many diagnostics
-      // cause the note not emitted.
-      if (FirstDiag && HasWarningOrError && ShowCallStack) {
-        emitCallStackNotes(S, FD);
-        FirstDiag = false;
-      }
     }
+    if (HasWarningOrError)
+      emitCallStackNotes(S, FD);
+  }
+
+  void emitCollectedDiags() {
+    for (const auto &FD : FnsToEmit)
+      emitDeferredDiags(FD);
   }
 };
 } // namespace
@@ -2043,6 +2060,7 @@ void Sema::emitDeferredDiags() {
   DeferredDiagnosticsEmitter DDE(*this);
   for (auto *D : DeclsToCheckForDeferredDiags)
     DDE.checkRecordedDecl(D);
+  DDE.emitCollectedDiags();
 }
 
 // In CUDA, there are some constructs which may appear in semantically-valid

diff  --git a/clang/test/OpenMP/nvptx_unsupported_type_messages.cpp 
b/clang/test/OpenMP/nvptx_unsupported_type_messages.cpp
index 9121740f98549..2907fb6f77380 100644
--- a/clang/test/OpenMP/nvptx_unsupported_type_messages.cpp
+++ b/clang/test/OpenMP/nvptx_unsupported_type_messages.cpp
@@ -247,7 +247,7 @@ long double c = q + b;
 #endif
 
 void hostFoo() {
-  boo(c - b);
+  boo(c - b); // expected-note {{called by 'hostFoo'}}
 }
 
 long double qa, qb;

diff  --git a/clang/test/SemaCUDA/deferred-diags-dedup.cu 
b/clang/test/SemaCUDA/deferred-diags-dedup.cu
new file mode 100644
index 0000000000000..320965fa7d939
--- /dev/null
+++ b/clang/test/SemaCUDA/deferred-diags-dedup.cu
@@ -0,0 +1,56 @@
+// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda -fcuda-is-device -fsyntax-only \
+// RUN:   -verify -Wno-vla %s
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -fsyntax-only \
+// RUN:   -verify -Wno-vla %s
+
+// NOTE: Do not autogenerate. Tests deferred diagnostic deduplication.
+
+// Tests that deferred diagnostics are emitted once per function, with all
+// callers listed as notes, rather than repeating the diagnostics for each
+// caller. See https://github.com/llvm/llvm-project/issues/180638.
+
+#include "Inputs/cuda.h"
+
+__host__ void hf(); // expected-note 3{{'hf' declared here}}
+
+// Lambda calling a host function. Its deferred diagnostics should be
+// emitted only once even when multiple device functions call it.
+__device__ auto l =
+  [] {
+    hf(); // expected-error {{reference to __host__ function 'hf' in __host__ 
__device__ function}}
+    hf(); // expected-error {{reference to __host__ function 'hf' in __host__ 
__device__ function}}
+  };
+
+__device__ void df1() {
+  l(); // expected-note {{called by 'df1'}}
+}
+
+__device__ void df2() {
+  l(); // expected-note {{called by 'df2'}}
+}
+
+__device__ void df3() {
+  l(); // expected-note {{called by 'df3'}}
+}
+
+// Test with shared call chains: two chains reaching the same function
+// through 
diff erent intermediate callers.
+inline __host__ __device__ void hdf() {
+  hf(); // expected-error {{reference to __host__ function 'hf' in __host__ 
__device__ function}}
+}
+
+inline __host__ __device__ void mid1() {
+  hdf(); // expected-note {{called by 'mid1'}}
+}
+
+__device__ void dev1() {
+  mid1(); // expected-note {{which is called by 'dev1'}}
+}
+
+inline __host__ __device__ void mid2() {
+  hdf(); // expected-note {{called by 'mid2'}}
+}
+
+__device__ void dev2() {
+  mid2(); // expected-note {{which is called by 'dev2'}}
+}

diff  --git a/clang/test/SemaCUDA/deferred-diags-limit.cu 
b/clang/test/SemaCUDA/deferred-diags-limit.cu
index 59328134da90a..6ce903acde754 100644
--- a/clang/test/SemaCUDA/deferred-diags-limit.cu
+++ b/clang/test/SemaCUDA/deferred-diags-limit.cu
@@ -8,13 +8,20 @@
 // CHECK-NOT: cannot use 'throw' in __host__ __device__ function
 // CHECK: too many errors emitted, stopping now
 
-inline __host__ __device__ void hasInvalid() {
+inline __host__ __device__ void hasInvalid1() {
+  throw NULL;
+}
+
+inline __host__ __device__ void hasInvalid2() {
+  throw NULL;
+}
+
+inline __host__ __device__ void hasInvalid3() {
   throw NULL;
 }
 
 __global__ void use0() {
-  hasInvalid();
-  hasInvalid();
-  hasInvalid();
-  hasInvalid();
+  hasInvalid1();
+  hasInvalid2();
+  hasInvalid3();
 }

diff  --git a/clang/test/SemaCUDA/deferred-diags.cu 
b/clang/test/SemaCUDA/deferred-diags.cu
index 125ddea95b996..0530889f749bb 100644
--- a/clang/test/SemaCUDA/deferred-diags.cu
+++ b/clang/test/SemaCUDA/deferred-diags.cu
@@ -5,12 +5,12 @@
 // Error, instantiated on device.
 inline __host__ __device__ void hasInvalid() {
   throw NULL;
-  // expected-error@-1 2{{cannot use 'throw' in __host__ __device__ function}}
+  // expected-error@-1 {{cannot use 'throw' in __host__ __device__ function}}
 }
 
 inline __host__ __device__ void hasInvalid2() {
   throw NULL;
-  // expected-error@-1 2{{cannot use 'throw' in __host__ __device__ function}}
+  // expected-error@-1 {{cannot use 'throw' in __host__ __device__ function}}
 }
 
 inline __host__ __device__ void hasInvalidDiscarded() {
@@ -20,7 +20,7 @@ inline __host__ __device__ void hasInvalidDiscarded() {
 
 static __device__ void use0() {
   hasInvalid(); // expected-note {{called by 'use0'}}
-  hasInvalid(); // expected-note {{called by 'use0'}}
+  hasInvalid();
 
   if constexpr (true) {
     hasInvalid2(); // expected-note {{called by 'use0'}}
@@ -31,7 +31,7 @@ static __device__ void use0() {
   if constexpr (false) {
     hasInvalidDiscarded();
   } else {
-    hasInvalid2(); // expected-note {{called by 'use0'}}
+    hasInvalid2();
   }
 
   if constexpr (false) {
@@ -39,24 +39,24 @@ static __device__ void use0() {
   }
 }
 
-// To avoid excessive diagnostic messages, deferred diagnostics are only
-// emitted the first time a function is called.
+// Deferred diagnostics are emitted once per function, with all callers
+// listed as notes.
 static __device__ void use1() {
-  use0(); // expected-note 4{{called by 'use1'}}
+  use0(); // expected-note 2{{which is called by 'use1'}}
   use0();
 }
 
 static __device__ void use2() {
-  use1(); // expected-note 4{{called by 'use2'}}
+  use1(); // expected-note 2{{which is called by 'use2'}}
   use1();
 }
 
 static __device__ void use3() {
-  use2(); // expected-note 4{{called by 'use3'}}
+  use2(); // expected-note 2{{which is called by 'use3'}}
   use2();
 }
 
 __global__ void use4() {
-  use3(); // expected-note 4{{called by 'use4'}}
+  use3(); // expected-note 2{{which is called by 'use4'}}
   use3();
 }

diff  --git 
a/clang/test/SemaSYCL/sycl-kernel-entry-point-attr-device-odr-use.cpp 
b/clang/test/SemaSYCL/sycl-kernel-entry-point-attr-device-odr-use.cpp
index 1aa48c739c043..e2854983da552 100644
--- a/clang/test/SemaSYCL/sycl-kernel-entry-point-attr-device-odr-use.cpp
+++ b/clang/test/SemaSYCL/sycl-kernel-entry-point-attr-device-odr-use.cpp
@@ -27,17 +27,17 @@ template<int> struct KN;
 // emission of a function during device compilation (but not during host
 // compilation) and to trigger a diagnostic if ODR-used from a function
 // emitted during device compilation.
-// device-note@+1 4 {{attribute is here}}
+// device-note@+1 2 {{attribute is here}}
 [[clang::sycl_kernel_entry_point(KN<1>)]]
 void skep();
 struct SKL {
-  // device-note@+1 6 {{attribute is here}}
+  // device-note@+1 4 {{attribute is here}}
   [[clang::sycl_kernel_entry_point(KN<2>)]]
   void mskep();
-  // device-note@+1 6 {{attribute is here}}
+  // device-note@+1 4 {{attribute is here}}
   [[clang::sycl_kernel_entry_point(KN<3>)]]
   static void smskep();
-  // device-note@+1 2 {{attribute is here}}
+  // device-note@+1 {{attribute is here}}
   [[clang::sycl_kernel_entry_point(KN<4>)]]
   void operator()() const;
 };
@@ -62,22 +62,22 @@ void df() {
   (void)typeid(&SKL::mskep);
   (void)typeid(&SKL::smskep);
 
-  // device-error@+1 2 {{function 'skep' cannot be used in device code because 
it is declared with the 'clang::sycl_kernel_entry_point' attribute}}
+  // device-error@+1 {{function 'skep' cannot be used in device code because 
it is declared with the 'clang::sycl_kernel_entry_point' attribute}}
   skep();
-  // device-error@+1 2 {{function 'mskep' cannot be used in device code 
because it is declared with the 'clang::sycl_kernel_entry_point' attribute}}
+  // device-error@+1 {{function 'mskep' cannot be used in device code because 
it is declared with the 'clang::sycl_kernel_entry_point' attribute}}
   SKL{}.mskep();
-  // device-error@+1 2 {{function 'smskep' cannot be used in device code 
because it is declared with the 'clang::sycl_kernel_entry_point' attribute}}
+  // device-error@+1 {{function 'smskep' cannot be used in device code because 
it is declared with the 'clang::sycl_kernel_entry_point' attribute}}
   SKL::smskep();
 
-  // device-error@+1 2 {{function 'skep' cannot be used in device code because 
it is declared with the 'clang::sycl_kernel_entry_point' attribute}}
+  // device-error@+1 {{function 'skep' cannot be used in device code because 
it is declared with the 'clang::sycl_kernel_entry_point' attribute}}
   (void)&skep;
-  // device-error@+1 2 {{function 'mskep' cannot be used in device code 
because it is declared with the 'clang::sycl_kernel_entry_point' attribute}}
+  // device-error@+1 {{function 'mskep' cannot be used in device code because 
it is declared with the 'clang::sycl_kernel_entry_point' attribute}}
   (void)&SKL::mskep;
-  // device-error@+1 2 {{function 'smskep' cannot be used in device code 
because it is declared with the 'clang::sycl_kernel_entry_point' attribute}}
+  // device-error@+1 {{function 'smskep' cannot be used in device code because 
it is declared with the 'clang::sycl_kernel_entry_point' attribute}}
   (void)&SKL::smskep;
 
   SKL sklo;
-  // device-error@+1 2 {{function 'operator()' cannot be used in device code 
because it is declared with the 'clang::sycl_kernel_entry_point' attribute}}
+  // device-error@+1 {{function 'operator()' cannot be used in device code 
because it is declared with the 'clang::sycl_kernel_entry_point' attribute}}
   sklo();
 }
 
@@ -133,6 +133,7 @@ void SKL::operator()() const {
 void sedf() {
   // device-note@+1 {{called by 'sedf'}}
   df();
+  // device-note@+2 {{which is called by 'sedf'}}
   // device-error@+1 {{function 'skep' cannot be used in device code because 
it is declared with the 'clang::sycl_kernel_entry_point' attribute}}
   skep();
   // device-error@+1 {{function 'mskep' cannot be used in device code because 
it is declared with the 'clang::sycl_kernel_entry_point' attribute}}

diff  --git a/llvm/docs/CompileCudaWithLLVM.rst 
b/llvm/docs/CompileCudaWithLLVM.rst
index 0bd121a895028..a557112c9e7f3 100644
--- a/llvm/docs/CompileCudaWithLLVM.rst
+++ b/llvm/docs/CompileCudaWithLLVM.rst
@@ -429,6 +429,21 @@ To enable these warnings, use the following compiler flag:
 
     -Wnvcc-compat
 
+Deferred Diagnostics
+--------------------
+
+In CUDA, a ``__host__ __device__`` function can be called from both host and
+device code. When such a function contains operations not valid on one side
+(e.g., calling a host-only function from device code), clang defers the
+diagnostics and only emits them if the function is actually reachable from a
+caller where the operation cannot be emitted. This avoids false errors in
+``__host__ __device__`` functions that are only used on the other side.
+
+For a detailed description of deferred diagnostics, HD-promoted functions,
+and call chain notes, see the
+`HIP Support 
<https://clang.llvm.org/docs/HIPSupport.html#deferred-diagnostics>`_
+documentation. The same mechanism applies to both CUDA and HIP.
+
 Using a Different Class on Host/Device
 --------------------------------------
 


        
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to