[clang] [llvm] [CUDA/HIP][SYCL] Deduplicate deferred diagnostics across multiple callers (PR #185926)

Yaxun Liu via cfe-commits Thu, 12 Mar 2026 12:55:55 -0700

https://github.com/yxsamliu updated 
https://github.com/llvm/llvm-project/pull/185926


>From 19c4233799fcbf7aa62581ecf470f18351379528 Mon Sep 17 00:00:00 2001
From: "Yaxun (Sam) Liu" <[email protected]>
Date: Wed, 11 Mar 2026 12:40:58 -0400
Subject: [PATCH 1/2] [CUDA/HIP][SYCL] Deduplicate deferred diagnostics across
 multiple callers

Deferred diagnostics for a function were emitted once per caller that
forced the function into device context. When multiple device functions
called the same host-device function containing errors, the diagnostics
were repeated for each caller, producing noisy duplicate output.

Change the deferred diagnostic emission to a two-pass approach:
1. During the call graph walk, collect callers in DeviceKnownEmittedFns
   (now storing multiple callers per function) and mark functions that
   need diagnostics, but don't emit yet.
2. After the walk completes, emit diagnostics once per function with
   all callers listed as notes.

Call chain notes now use "called by" for the first caller in each chain
and "then called by" for subsequent callers in the chain, making it
easy to distinguish separate call chains.

Also add documentation for deferred diagnostics and the concept of
device-promoted functions to the HIP and CUDA docs.

Fixes: https://github.com/llvm/llvm-project/issues/180638
---
 clang/docs/HIPSupport.rst                     | 82 +++++++++++++++++++
 .../clang/Basic/DiagnosticSemaKinds.td        |  1 +
 clang/include/clang/Sema/SemaCUDA.h           |  7 +-
 clang/lib/Sema/Sema.cpp                       | 65 +++++++++------
 .../nvptx_unsupported_type_messages.cpp       |  2 +-
 clang/test/SemaCUDA/deferred-diags-dedup.cu   | 56 +++++++++++++
 clang/test/SemaCUDA/deferred-diags-limit.cu   | 17 ++--
 clang/test/SemaCUDA/deferred-diags.cu         | 20 ++---
 ...kernel-entry-point-attr-device-odr-use.cpp | 23 +++---
 llvm/docs/CompileCudaWithLLVM.rst             | 15 ++++
 10 files changed, 235 insertions(+), 53 deletions(-)
 create mode 100644 clang/test/SemaCUDA/deferred-diags-dedup.cu

diff --git a/clang/docs/HIPSupport.rst b/clang/docs/HIPSupport.rst
index c2a91a3062bc3..9a47fa808d3e9 100644
--- a/clang/docs/HIPSupport.rst
+++ b/clang/docs/HIPSupport.rst
@@ -484,6 +484,88 @@ non-constexpr function, which is by default a host 
function.
 Users can override the inferred host and device attributes of default
 destructors by adding explicit host and device attributes to them.
 
+Deferred Diagnostics
+====================
+
+In HIP (and CUDA), a ``__host__ __device__`` function can be called from both
+host and device code. Certain operations are not allowed on the device (e.g.,
+calling a host-only function, using variable-length arrays, or throwing
+exceptions). However, a ``__host__ __device__`` function containing such
+operations is only ill-formed if it is actually called from device code.
+
+Clang handles this through *deferred diagnostics*: errors and warnings in
+``__host__ __device__`` functions are recorded during parsing but not emitted
+immediately. They are only emitted if the function turns out to be reachable
+from code that must run on the device.
+
+Device-Promoted Functions
+-------------------------
+
+A *device-promoted function* is a function that is not explicitly restricted to
+device context (it is either ``__host__ __device__`` or, in the case of
+lambdas, implicitly ``__host__ __device__``) but is used from device code,
+forcing it to be compiled for the device. Device-promoted functions are the
+primary source of deferred diagnostics.
+
+Common examples of device-promoted functions:
+
+- Lambdas without explicit ``__host__`` or ``__device__`` attributes
+- ``__host__ __device__`` functions that call host-only functions
+- ``inline __host__ __device__`` helper functions used from device code
+
+When a device-promoted function contains operations that are not valid on the
+device, clang emits the deferred diagnostics along with notes showing how the
+function was reached from device code.
+
+Example
+^^^^^^^
+
+.. code-block:: c++
+
+   __host__ void host_only();
+
+   // This lambda is implicitly __host__ __device__. It is device-promoted
+   // when called from a __device__ function.
+   __device__ auto lambda = [] {
+     host_only();  // error: only emitted if lambda is used from device code
+   };
+
+   __device__ void df1() {
+     lambda();  // triggers deferred diagnostic for lambda
+   }
+
+   __device__ void df2() {
+     lambda();  // same lambda, same error — not duplicated
+   }
+
+Clang emits the error once and lists all device callers:
+
+.. code-block:: text
+
+   error: reference to __host__ function 'host_only' in __host__ __device__ 
function
+     note: 'host_only' declared here
+     note: called by 'df1'
+     note: called by 'df2'
+
+Call Chain Notes
+^^^^^^^^^^^^^^^^
+
+When a device-promoted function is reached through a chain of intermediate
+functions, clang shows the full call chain. The first note in each chain uses
+"called by" and subsequent notes use "then called by":
+
+.. code-block:: text
+
+   error: reference to __host__ function 'host_only' in __host__ __device__ 
function
+     note: called by 'helper1'
+     note: then called by 'device_func1'
+     note: called by 'helper2'
+     note: then called by 'device_func2'
+
+Each "called by" starts a new chain, and "then called by" continues it. This
+makes it clear which device function ultimately forced the code into device
+context.
+
 C++ Standard Parallelism Offload Support: Compiler And Runtime
 ==============================================================
 
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td 
b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 0c25eb2443d5e..b53d8bcd9a171 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -9692,6 +9692,7 @@ def err_deleted_inherited_ctor_use : Error<
   "constructor inherited by %0 from base class %1 is implicitly deleted">;
 
 def note_called_by : Note<"called by %0">;
+def note_then_called_by : Note<"then called by %0">;
 def err_kern_type_not_void_return : Error<
   "kernel function type %0 must have void return type">;
 def err_kern_is_nonstatic_method : Error<
diff --git a/clang/include/clang/Sema/SemaCUDA.h 
b/clang/include/clang/Sema/SemaCUDA.h
index dbb4290f5d149..2907fe4099d8d 100644
--- a/clang/include/clang/Sema/SemaCUDA.h
+++ b/clang/include/clang/Sema/SemaCUDA.h
@@ -72,13 +72,14 @@ class SemaCUDA : public SemaBase {
   /// same deferred diag twice.
   llvm::DenseSet<FunctionDeclAndLoc> LocsWithCUDACallDiags;
 
-  /// An inverse call graph, mapping known-emitted functions to one of their
+  /// An inverse call graph, mapping known-emitted functions to their
   /// known-emitted callers (plus the location of the call).
   ///
   /// Functions that we can tell a priori must be emitted aren't added to this
-  /// map.
+  /// map. A function may have multiple callers that force it into device
+  /// context, so we store all of them to produce complete diagnostics.
   llvm::DenseMap</* Callee = */ CanonicalDeclPtr<const FunctionDecl>,
-                 /* Caller = */ FunctionDeclAndLoc>
+                 /* Callers = */ llvm::SmallVector<FunctionDeclAndLoc, 1>>
       DeviceKnownEmittedFns;
 
   /// Creates a SemaDiagnosticBuilder that emits the diagnostic if the current
diff --git a/clang/lib/Sema/Sema.cpp b/clang/lib/Sema/Sema.cpp
index 3065b5e1e66d3..f13781498a5e2 100644
--- a/clang/lib/Sema/Sema.cpp
+++ b/clang/lib/Sema/Sema.cpp
@@ -73,6 +73,7 @@
 #include "clang/Sema/TypoCorrection.h"
 #include "llvm/ADT/DenseMap.h"
 #include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/SetVector.h"
 #include "llvm/ADT/SmallPtrSet.h"
 #include "llvm/Support/TimeProfiler.h"
 #include <optional>
@@ -1815,17 +1816,27 @@ bool Sema::hasUncompilableErrorOccurred() const {
 }
 
 // Print notes showing how we can reach FD starting from an a priori
-// known-callable function.
+// known-callable function. When a function has multiple callers, emit
+// each call chain separately. The first note in each chain uses
+// "called by" and subsequent notes use "then called by".
 static void emitCallStackNotes(Sema &S, const FunctionDecl *FD) {
   auto FnIt = S.CUDA().DeviceKnownEmittedFns.find(FD);
-  while (FnIt != S.CUDA().DeviceKnownEmittedFns.end()) {
-    // Respect error limit.
+  if (FnIt == S.CUDA().DeviceKnownEmittedFns.end())
+    return;
+
+  for (const auto &CallerInfo : FnIt->second) {
     if (S.Diags.hasFatalErrorOccurred())
       return;
-    DiagnosticBuilder Builder(
-        S.Diags.Report(FnIt->second.Loc, diag::note_called_by));
-    Builder << FnIt->second.FD;
-    FnIt = S.CUDA().DeviceKnownEmittedFns.find(FnIt->second.FD);
+    S.Diags.Report(CallerInfo.Loc, diag::note_called_by) << CallerInfo.FD;
+    // Walk up the rest of the chain using "then called by".
+    auto NextIt = S.CUDA().DeviceKnownEmittedFns.find(CallerInfo.FD);
+    while (NextIt != S.CUDA().DeviceKnownEmittedFns.end()) {
+      if (S.Diags.hasFatalErrorOccurred())
+        return;
+      const auto &Next = NextIt->second.front();
+      S.Diags.Report(Next.Loc, diag::note_then_called_by) << Next.FD;
+      NextIt = S.CUDA().DeviceKnownEmittedFns.find(Next.FD);
+    }
   }
 }
 
@@ -1875,6 +1886,11 @@ class DeferredDiagnosticsEmitter
   // different depending on whether it is in OpenMP device context.
   llvm::SmallPtrSet<CanonicalDeclPtr<Decl>, 4> DoneMap[2];
 
+  // Functions that need their deferred diagnostics emitted. Collected
+  // during the graph walk and emitted afterwards so that all callers
+  // are known when producing call chain notes.
+  llvm::SetVector<CanonicalDeclPtr<const FunctionDecl>> FnsToEmit;
+
   // Emission state of the root node of the current use graph.
   bool ShouldEmitRootNode;
 
@@ -1969,13 +1985,17 @@ class DeferredDiagnosticsEmitter
     if (Caller && S.LangOpts.OpenMP && UsePath.size() == 1 &&
         (ShouldEmitRootNode || InOMPDeviceContext))
       S.OpenMP().finalizeOpenMPDelayedAnalysis(Caller, FD, Loc);
-    if (Caller)
-      S.CUDA().DeviceKnownEmittedFns[FD] = {Caller, Loc};
-    // Always emit deferred diagnostics for the direct users. This does not
-    // lead to explosion of diagnostics since each user is visited at most
-    // twice.
+    if (Caller) {
+      auto &Callers = S.CUDA().DeviceKnownEmittedFns[FD];
+      CanonicalDeclPtr<const FunctionDecl> CanonCaller(Caller);
+      if (llvm::none_of(Callers,
+                        [CanonCaller](const auto &C) {
+                          return C.FD == CanonCaller;
+                        }))
+        Callers.push_back({Caller, Loc});
+    }
     if (ShouldEmitRootNode || InOMPDeviceContext)
-      emitDeferredDiags(FD, Caller);
+      FnsToEmit.insert(FD);
     // Do not revisit a function if the function body has been completely
     // visited before.
     if (!Done.insert(FD).second)
@@ -2000,15 +2020,12 @@ class DeferredDiagnosticsEmitter
       checkVar(cast<VarDecl>(D));
   }
 
-  // Emit any deferred diagnostics for FD
-  void emitDeferredDiags(FunctionDecl *FD, bool ShowCallStack) {
+  void emitDeferredDiags(const FunctionDecl *FD) {
     auto It = S.DeviceDeferredDiags.find(FD);
     if (It == S.DeviceDeferredDiags.end())
       return;
     bool HasWarningOrError = false;
-    bool FirstDiag = true;
     for (PartialDiagnosticAt &PDAt : It->second) {
-      // Respect error limit.
       if (S.Diags.hasFatalErrorOccurred())
         return;
       const SourceLocation &Loc = PDAt.first;
@@ -2020,13 +2037,14 @@ class DeferredDiagnosticsEmitter
         DiagnosticBuilder Builder(S.Diags.Report(Loc, PD.getDiagID()));
         PD.Emit(Builder);
       }
-      // Emit the note on the first diagnostic in case too many diagnostics
-      // cause the note not emitted.
-      if (FirstDiag && HasWarningOrError && ShowCallStack) {
-        emitCallStackNotes(S, FD);
-        FirstDiag = false;
-      }
     }
+    if (HasWarningOrError)
+      emitCallStackNotes(S, FD);
+  }
+
+  void emitCollectedDiags() {
+    for (const auto &FD : FnsToEmit)
+      emitDeferredDiags(FD);
   }
 };
 } // namespace
@@ -2043,6 +2061,7 @@ void Sema::emitDeferredDiags() {
   DeferredDiagnosticsEmitter DDE(*this);
   for (auto *D : DeclsToCheckForDeferredDiags)
     DDE.checkRecordedDecl(D);
+  DDE.emitCollectedDiags();
 }
 
 // In CUDA, there are some constructs which may appear in semantically-valid
diff --git a/clang/test/OpenMP/nvptx_unsupported_type_messages.cpp 
b/clang/test/OpenMP/nvptx_unsupported_type_messages.cpp
index 9121740f98549..2907fb6f77380 100644
--- a/clang/test/OpenMP/nvptx_unsupported_type_messages.cpp
+++ b/clang/test/OpenMP/nvptx_unsupported_type_messages.cpp
@@ -247,7 +247,7 @@ long double c = q + b;
 #endif
 
 void hostFoo() {
-  boo(c - b);
+  boo(c - b); // expected-note {{called by 'hostFoo'}}
 }
 
 long double qa, qb;
diff --git a/clang/test/SemaCUDA/deferred-diags-dedup.cu 
b/clang/test/SemaCUDA/deferred-diags-dedup.cu
new file mode 100644
index 0000000000000..0739921c5f9cd
--- /dev/null
+++ b/clang/test/SemaCUDA/deferred-diags-dedup.cu
@@ -0,0 +1,56 @@
+// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda -fcuda-is-device -fsyntax-only \
+// RUN:   -verify -Wno-vla %s
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -fcuda-is-device -fsyntax-only \
+// RUN:   -verify -Wno-vla %s
+
+// NOTE: Do not autogenerate. Tests deferred diagnostic deduplication.
+
+// Tests that deferred diagnostics are emitted once per function, with all
+// callers listed as notes, rather than repeating the diagnostics for each
+// caller. See https://github.com/llvm/llvm-project/issues/180638.
+
+#include "Inputs/cuda.h"
+
+__host__ void hf(); // expected-note 3{{'hf' declared here}}
+
+// Lambda calling a host function. Its deferred diagnostics should be
+// emitted only once even when multiple device functions call it.
+__device__ auto l =
+  [] {
+    hf(); // expected-error {{reference to __host__ function 'hf' in __host__ 
__device__ function}}
+    hf(); // expected-error {{reference to __host__ function 'hf' in __host__ 
__device__ function}}
+  };
+
+__device__ void df1() {
+  l(); // expected-note {{called by 'df1'}}
+}
+
+__device__ void df2() {
+  l(); // expected-note {{called by 'df2'}}
+}
+
+__device__ void df3() {
+  l(); // expected-note {{called by 'df3'}}
+}
+
+// Test with shared call chains: two chains reaching the same function
+// through different intermediate callers.
+inline __host__ __device__ void hdf() {
+  hf(); // expected-error {{reference to __host__ function 'hf' in __host__ 
__device__ function}}
+}
+
+inline __host__ __device__ void mid1() {
+  hdf(); // expected-note {{called by 'mid1'}}
+}
+
+__device__ void dev1() {
+  mid1(); // expected-note {{then called by 'dev1'}}
+}
+
+inline __host__ __device__ void mid2() {
+  hdf(); // expected-note {{called by 'mid2'}}
+}
+
+__device__ void dev2() {
+  mid2(); // expected-note {{then called by 'dev2'}}
+}
diff --git a/clang/test/SemaCUDA/deferred-diags-limit.cu 
b/clang/test/SemaCUDA/deferred-diags-limit.cu
index 59328134da90a..6ce903acde754 100644
--- a/clang/test/SemaCUDA/deferred-diags-limit.cu
+++ b/clang/test/SemaCUDA/deferred-diags-limit.cu
@@ -8,13 +8,20 @@
 // CHECK-NOT: cannot use 'throw' in __host__ __device__ function
 // CHECK: too many errors emitted, stopping now
 
-inline __host__ __device__ void hasInvalid() {
+inline __host__ __device__ void hasInvalid1() {
+  throw NULL;
+}
+
+inline __host__ __device__ void hasInvalid2() {
+  throw NULL;
+}
+
+inline __host__ __device__ void hasInvalid3() {
   throw NULL;
 }
 
 __global__ void use0() {
-  hasInvalid();
-  hasInvalid();
-  hasInvalid();
-  hasInvalid();
+  hasInvalid1();
+  hasInvalid2();
+  hasInvalid3();
 }
diff --git a/clang/test/SemaCUDA/deferred-diags.cu 
b/clang/test/SemaCUDA/deferred-diags.cu
index 125ddea95b996..99c291b694b97 100644
--- a/clang/test/SemaCUDA/deferred-diags.cu
+++ b/clang/test/SemaCUDA/deferred-diags.cu
@@ -5,12 +5,12 @@
 // Error, instantiated on device.
 inline __host__ __device__ void hasInvalid() {
   throw NULL;
-  // expected-error@-1 2{{cannot use 'throw' in __host__ __device__ function}}
+  // expected-error@-1 {{cannot use 'throw' in __host__ __device__ function}}
 }
 
 inline __host__ __device__ void hasInvalid2() {
   throw NULL;
-  // expected-error@-1 2{{cannot use 'throw' in __host__ __device__ function}}
+  // expected-error@-1 {{cannot use 'throw' in __host__ __device__ function}}
 }
 
 inline __host__ __device__ void hasInvalidDiscarded() {
@@ -20,7 +20,7 @@ inline __host__ __device__ void hasInvalidDiscarded() {
 
 static __device__ void use0() {
   hasInvalid(); // expected-note {{called by 'use0'}}
-  hasInvalid(); // expected-note {{called by 'use0'}}
+  hasInvalid();
 
   if constexpr (true) {
     hasInvalid2(); // expected-note {{called by 'use0'}}
@@ -31,7 +31,7 @@ static __device__ void use0() {
   if constexpr (false) {
     hasInvalidDiscarded();
   } else {
-    hasInvalid2(); // expected-note {{called by 'use0'}}
+    hasInvalid2();
   }
 
   if constexpr (false) {
@@ -39,24 +39,24 @@ static __device__ void use0() {
   }
 }
 
-// To avoid excessive diagnostic messages, deferred diagnostics are only
-// emitted the first time a function is called.
+// Deferred diagnostics are emitted once per function, with all callers
+// listed as notes.
 static __device__ void use1() {
-  use0(); // expected-note 4{{called by 'use1'}}
+  use0(); // expected-note 2{{then called by 'use1'}}
   use0();
 }
 
 static __device__ void use2() {
-  use1(); // expected-note 4{{called by 'use2'}}
+  use1(); // expected-note 2{{then called by 'use2'}}
   use1();
 }
 
 static __device__ void use3() {
-  use2(); // expected-note 4{{called by 'use3'}}
+  use2(); // expected-note 2{{then called by 'use3'}}
   use2();
 }
 
 __global__ void use4() {
-  use3(); // expected-note 4{{called by 'use4'}}
+  use3(); // expected-note 2{{then called by 'use4'}}
   use3();
 }
diff --git 
a/clang/test/SemaSYCL/sycl-kernel-entry-point-attr-device-odr-use.cpp 
b/clang/test/SemaSYCL/sycl-kernel-entry-point-attr-device-odr-use.cpp
index 1aa48c739c043..def758aac7c90 100644
--- a/clang/test/SemaSYCL/sycl-kernel-entry-point-attr-device-odr-use.cpp
+++ b/clang/test/SemaSYCL/sycl-kernel-entry-point-attr-device-odr-use.cpp
@@ -27,17 +27,17 @@ template<int> struct KN;
 // emission of a function during device compilation (but not during host
 // compilation) and to trigger a diagnostic if ODR-used from a function
 // emitted during device compilation.
-// device-note@+1 4 {{attribute is here}}
+// device-note@+1 2 {{attribute is here}}
 [[clang::sycl_kernel_entry_point(KN<1>)]]
 void skep();
 struct SKL {
-  // device-note@+1 6 {{attribute is here}}
+  // device-note@+1 4 {{attribute is here}}
   [[clang::sycl_kernel_entry_point(KN<2>)]]
   void mskep();
-  // device-note@+1 6 {{attribute is here}}
+  // device-note@+1 4 {{attribute is here}}
   [[clang::sycl_kernel_entry_point(KN<3>)]]
   static void smskep();
-  // device-note@+1 2 {{attribute is here}}
+  // device-note@+1 {{attribute is here}}
   [[clang::sycl_kernel_entry_point(KN<4>)]]
   void operator()() const;
 };
@@ -62,22 +62,22 @@ void df() {
   (void)typeid(&SKL::mskep);
   (void)typeid(&SKL::smskep);
 
-  // device-error@+1 2 {{function 'skep' cannot be used in device code because 
it is declared with the 'clang::sycl_kernel_entry_point' attribute}}
+  // device-error@+1 {{function 'skep' cannot be used in device code because 
it is declared with the 'clang::sycl_kernel_entry_point' attribute}}
   skep();
-  // device-error@+1 2 {{function 'mskep' cannot be used in device code 
because it is declared with the 'clang::sycl_kernel_entry_point' attribute}}
+  // device-error@+1 {{function 'mskep' cannot be used in device code because 
it is declared with the 'clang::sycl_kernel_entry_point' attribute}}
   SKL{}.mskep();
-  // device-error@+1 2 {{function 'smskep' cannot be used in device code 
because it is declared with the 'clang::sycl_kernel_entry_point' attribute}}
+  // device-error@+1 {{function 'smskep' cannot be used in device code because 
it is declared with the 'clang::sycl_kernel_entry_point' attribute}}
   SKL::smskep();
 
-  // device-error@+1 2 {{function 'skep' cannot be used in device code because 
it is declared with the 'clang::sycl_kernel_entry_point' attribute}}
+  // device-error@+1 {{function 'skep' cannot be used in device code because 
it is declared with the 'clang::sycl_kernel_entry_point' attribute}}
   (void)&skep;
-  // device-error@+1 2 {{function 'mskep' cannot be used in device code 
because it is declared with the 'clang::sycl_kernel_entry_point' attribute}}
+  // device-error@+1 {{function 'mskep' cannot be used in device code because 
it is declared with the 'clang::sycl_kernel_entry_point' attribute}}
   (void)&SKL::mskep;
-  // device-error@+1 2 {{function 'smskep' cannot be used in device code 
because it is declared with the 'clang::sycl_kernel_entry_point' attribute}}
+  // device-error@+1 {{function 'smskep' cannot be used in device code because 
it is declared with the 'clang::sycl_kernel_entry_point' attribute}}
   (void)&SKL::smskep;
 
   SKL sklo;
-  // device-error@+1 2 {{function 'operator()' cannot be used in device code 
because it is declared with the 'clang::sycl_kernel_entry_point' attribute}}
+  // device-error@+1 {{function 'operator()' cannot be used in device code 
because it is declared with the 'clang::sycl_kernel_entry_point' attribute}}
   sklo();
 }
 
@@ -133,6 +133,7 @@ void SKL::operator()() const {
 void sedf() {
   // device-note@+1 {{called by 'sedf'}}
   df();
+  // device-note@+2 {{then called by 'sedf'}}
   // device-error@+1 {{function 'skep' cannot be used in device code because 
it is declared with the 'clang::sycl_kernel_entry_point' attribute}}
   skep();
   // device-error@+1 {{function 'mskep' cannot be used in device code because 
it is declared with the 'clang::sycl_kernel_entry_point' attribute}}
diff --git a/llvm/docs/CompileCudaWithLLVM.rst 
b/llvm/docs/CompileCudaWithLLVM.rst
index 0bd121a895028..59fa327d07d6a 100644
--- a/llvm/docs/CompileCudaWithLLVM.rst
+++ b/llvm/docs/CompileCudaWithLLVM.rst
@@ -429,6 +429,21 @@ To enable these warnings, use the following compiler flag:
 
     -Wnvcc-compat
 
+Deferred Diagnostics
+--------------------
+
+In CUDA, a ``__host__ __device__`` function can be called from both host and
+device code. When such a function contains operations not valid on the device
+(e.g., calling a host-only function), clang defers the diagnostics and only
+emits them if the function is actually reachable from device code. This avoids
+false errors in ``__host__ __device__`` functions that are only used on the
+host side.
+
+For a detailed description of deferred diagnostics, device-promoted functions,
+and call chain notes, see the
+`HIP Support 
<https://clang.llvm.org/docs/HIPSupport.html#deferred-diagnostics>`_
+documentation. The same mechanism applies to both CUDA and HIP.
+
 Using a Different Class on Host/Device
 --------------------------------------
 

>From ad42e81378a9efe901f8d575a740db1373c49a4e Mon Sep 17 00:00:00 2001
From: "Yaxun (Sam) Liu" <[email protected]>
Date: Thu, 12 Mar 2026 13:37:00 -0400
Subject: [PATCH 2/2] Address review: rename to HD-promoted, use "which is
 called by", fix formatting

- Clarify doc wording: diagnostics fire when a device function with
  guaranteed IR emission calls an HD function with invalid operations
- Rename "device-promoted" to "HD-promoted" in documentation
- Change "then called by" to "which is called by" for clearer call chains
- Fix clang-format issue in Sema.cpp
---
 clang/docs/HIPSupport.rst                     | 56 ++++++++++---------
 .../clang/Basic/DiagnosticSemaKinds.td        |  2 +-
 clang/lib/Sema/Sema.cpp                       | 13 ++---
 clang/test/SemaCUDA/deferred-diags-dedup.cu   |  4 +-
 clang/test/SemaCUDA/deferred-diags.cu         |  8 +--
 ...kernel-entry-point-attr-device-odr-use.cpp |  2 +-
 llvm/docs/CompileCudaWithLLVM.rst             | 12 ++--
 7 files changed, 49 insertions(+), 48 deletions(-)

diff --git a/clang/docs/HIPSupport.rst b/clang/docs/HIPSupport.rst
index 9a47fa808d3e9..e7f00202c34e7 100644
--- a/clang/docs/HIPSupport.rst
+++ b/clang/docs/HIPSupport.rst
@@ -488,34 +488,36 @@ Deferred Diagnostics
 ====================
 
 In HIP (and CUDA), a ``__host__ __device__`` function can be called from both
-host and device code. Certain operations are not allowed on the device (e.g.,
-calling a host-only function, using variable-length arrays, or throwing
-exceptions). However, a ``__host__ __device__`` function containing such
-operations is only ill-formed if it is actually called from device code.
+host and device code. Certain operations may not be valid on one side (e.g.,
+calling a host-only function from device code, or referencing a device-only
+function from host code). However, a ``__host__ __device__`` function
+containing such operations is only ill-formed if the function is actually
+reachable from a caller where the operation cannot be emitted.
 
 Clang handles this through *deferred diagnostics*: errors and warnings in
 ``__host__ __device__`` functions are recorded during parsing but not emitted
-immediately. They are only emitted if the function turns out to be reachable
-from code that must run on the device.
+immediately. They are only emitted when a function whose linkage guarantees
+IR emission (e.g., a kernel, or an externally visible ``__host__`` or
+``__device__`` function) directly or indirectly calls the
+``__host__ __device__`` function containing the invalid operation.
 
-Device-Promoted Functions
--------------------------
+HD-Promoted Functions
+---------------------
 
-A *device-promoted function* is a function that is not explicitly restricted to
-device context (it is either ``__host__ __device__`` or, in the case of
-lambdas, implicitly ``__host__ __device__``) but is used from device code,
-forcing it to be compiled for the device. Device-promoted functions are the
-primary source of deferred diagnostics.
+An *HD-promoted function* is a function that is implicitly or explicitly
+``__host__ __device__`` and is only emitted because a caller requires it —
+it does not have standalone linkage that guarantees emission. HD-promoted
+functions are the primary source of deferred diagnostics.
 
-Common examples of device-promoted functions:
+Common examples of HD-promoted functions:
 
 - Lambdas without explicit ``__host__`` or ``__device__`` attributes
-- ``__host__ __device__`` functions that call host-only functions
-- ``inline __host__ __device__`` helper functions used from device code
+- ``inline __host__ __device__`` helper functions
+- ``__host__ __device__`` template instantiations
 
-When a device-promoted function contains operations that are not valid on the
-device, clang emits the deferred diagnostics along with notes showing how the
-function was reached from device code.
+When an HD-promoted function contains operations that are not valid on the
+caller's side, clang emits the deferred diagnostics along with notes showing
+how the function was reached.
 
 Example
 ^^^^^^^
@@ -524,7 +526,7 @@ Example
 
    __host__ void host_only();
 
-   // This lambda is implicitly __host__ __device__. It is device-promoted
+   // This lambda is implicitly __host__ __device__. It is HD-promoted
    // when called from a __device__ function.
    __device__ auto lambda = [] {
      host_only();  // error: only emitted if lambda is used from device code
@@ -550,21 +552,21 @@ Clang emits the error once and lists all device callers:
 Call Chain Notes
 ^^^^^^^^^^^^^^^^
 
-When a device-promoted function is reached through a chain of intermediate
+When an HD-promoted function is reached through a chain of intermediate
 functions, clang shows the full call chain. The first note in each chain uses
-"called by" and subsequent notes use "then called by":
+"called by" and subsequent notes use "which is called by":
 
 .. code-block:: text
 
    error: reference to __host__ function 'host_only' in __host__ __device__ 
function
      note: called by 'helper1'
-     note: then called by 'device_func1'
+     note: which is called by 'device_func1'
      note: called by 'helper2'
-     note: then called by 'device_func2'
+     note: which is called by 'device_func2'
 
-Each "called by" starts a new chain, and "then called by" continues it. This
-makes it clear which device function ultimately forced the code into device
-context.
+Each "called by" starts a new chain, and "which is called by" continues it.
+This makes it clear which device function ultimately forced the code into
+device context.
 
 C++ Standard Parallelism Offload Support: Compiler And Runtime
 ==============================================================
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td 
b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index b53d8bcd9a171..fae63cc0ba139 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -9692,7 +9692,7 @@ def err_deleted_inherited_ctor_use : Error<
   "constructor inherited by %0 from base class %1 is implicitly deleted">;
 
 def note_called_by : Note<"called by %0">;
-def note_then_called_by : Note<"then called by %0">;
+def note_which_is_called_by : Note<"which is called by %0">;
 def err_kern_type_not_void_return : Error<
   "kernel function type %0 must have void return type">;
 def err_kern_is_nonstatic_method : Error<
diff --git a/clang/lib/Sema/Sema.cpp b/clang/lib/Sema/Sema.cpp
index f13781498a5e2..8b1d0398cf65d 100644
--- a/clang/lib/Sema/Sema.cpp
+++ b/clang/lib/Sema/Sema.cpp
@@ -1818,7 +1818,7 @@ bool Sema::hasUncompilableErrorOccurred() const {
 // Print notes showing how we can reach FD starting from an a priori
 // known-callable function. When a function has multiple callers, emit
 // each call chain separately. The first note in each chain uses
-// "called by" and subsequent notes use "then called by".
+// "called by" and subsequent notes use "which is called by".
 static void emitCallStackNotes(Sema &S, const FunctionDecl *FD) {
   auto FnIt = S.CUDA().DeviceKnownEmittedFns.find(FD);
   if (FnIt == S.CUDA().DeviceKnownEmittedFns.end())
@@ -1828,13 +1828,13 @@ static void emitCallStackNotes(Sema &S, const 
FunctionDecl *FD) {
     if (S.Diags.hasFatalErrorOccurred())
       return;
     S.Diags.Report(CallerInfo.Loc, diag::note_called_by) << CallerInfo.FD;
-    // Walk up the rest of the chain using "then called by".
+    // Walk up the rest of the chain using "which is called by".
     auto NextIt = S.CUDA().DeviceKnownEmittedFns.find(CallerInfo.FD);
     while (NextIt != S.CUDA().DeviceKnownEmittedFns.end()) {
       if (S.Diags.hasFatalErrorOccurred())
         return;
       const auto &Next = NextIt->second.front();
-      S.Diags.Report(Next.Loc, diag::note_then_called_by) << Next.FD;
+      S.Diags.Report(Next.Loc, diag::note_which_is_called_by) << Next.FD;
       NextIt = S.CUDA().DeviceKnownEmittedFns.find(Next.FD);
     }
   }
@@ -1988,10 +1988,9 @@ class DeferredDiagnosticsEmitter
     if (Caller) {
       auto &Callers = S.CUDA().DeviceKnownEmittedFns[FD];
       CanonicalDeclPtr<const FunctionDecl> CanonCaller(Caller);
-      if (llvm::none_of(Callers,
-                        [CanonCaller](const auto &C) {
-                          return C.FD == CanonCaller;
-                        }))
+      if (llvm::none_of(Callers, [CanonCaller](const auto &C) {
+            return C.FD == CanonCaller;
+          }))
         Callers.push_back({Caller, Loc});
     }
     if (ShouldEmitRootNode || InOMPDeviceContext)
diff --git a/clang/test/SemaCUDA/deferred-diags-dedup.cu 
b/clang/test/SemaCUDA/deferred-diags-dedup.cu
index 0739921c5f9cd..320965fa7d939 100644
--- a/clang/test/SemaCUDA/deferred-diags-dedup.cu
+++ b/clang/test/SemaCUDA/deferred-diags-dedup.cu
@@ -44,7 +44,7 @@ inline __host__ __device__ void mid1() {
 }
 
 __device__ void dev1() {
-  mid1(); // expected-note {{then called by 'dev1'}}
+  mid1(); // expected-note {{which is called by 'dev1'}}
 }
 
 inline __host__ __device__ void mid2() {
@@ -52,5 +52,5 @@ inline __host__ __device__ void mid2() {
 }
 
 __device__ void dev2() {
-  mid2(); // expected-note {{then called by 'dev2'}}
+  mid2(); // expected-note {{which is called by 'dev2'}}
 }
diff --git a/clang/test/SemaCUDA/deferred-diags.cu 
b/clang/test/SemaCUDA/deferred-diags.cu
index 99c291b694b97..0530889f749bb 100644
--- a/clang/test/SemaCUDA/deferred-diags.cu
+++ b/clang/test/SemaCUDA/deferred-diags.cu
@@ -42,21 +42,21 @@ static __device__ void use0() {
 // Deferred diagnostics are emitted once per function, with all callers
 // listed as notes.
 static __device__ void use1() {
-  use0(); // expected-note 2{{then called by 'use1'}}
+  use0(); // expected-note 2{{which is called by 'use1'}}
   use0();
 }
 
 static __device__ void use2() {
-  use1(); // expected-note 2{{then called by 'use2'}}
+  use1(); // expected-note 2{{which is called by 'use2'}}
   use1();
 }
 
 static __device__ void use3() {
-  use2(); // expected-note 2{{then called by 'use3'}}
+  use2(); // expected-note 2{{which is called by 'use3'}}
   use2();
 }
 
 __global__ void use4() {
-  use3(); // expected-note 2{{then called by 'use4'}}
+  use3(); // expected-note 2{{which is called by 'use4'}}
   use3();
 }
diff --git 
a/clang/test/SemaSYCL/sycl-kernel-entry-point-attr-device-odr-use.cpp 
b/clang/test/SemaSYCL/sycl-kernel-entry-point-attr-device-odr-use.cpp
index def758aac7c90..e2854983da552 100644
--- a/clang/test/SemaSYCL/sycl-kernel-entry-point-attr-device-odr-use.cpp
+++ b/clang/test/SemaSYCL/sycl-kernel-entry-point-attr-device-odr-use.cpp
@@ -133,7 +133,7 @@ void SKL::operator()() const {
 void sedf() {
   // device-note@+1 {{called by 'sedf'}}
   df();
-  // device-note@+2 {{then called by 'sedf'}}
+  // device-note@+2 {{which is called by 'sedf'}}
   // device-error@+1 {{function 'skep' cannot be used in device code because 
it is declared with the 'clang::sycl_kernel_entry_point' attribute}}
   skep();
   // device-error@+1 {{function 'mskep' cannot be used in device code because 
it is declared with the 'clang::sycl_kernel_entry_point' attribute}}
diff --git a/llvm/docs/CompileCudaWithLLVM.rst 
b/llvm/docs/CompileCudaWithLLVM.rst
index 59fa327d07d6a..a557112c9e7f3 100644
--- a/llvm/docs/CompileCudaWithLLVM.rst
+++ b/llvm/docs/CompileCudaWithLLVM.rst
@@ -433,13 +433,13 @@ Deferred Diagnostics
 --------------------
 
 In CUDA, a ``__host__ __device__`` function can be called from both host and
-device code. When such a function contains operations not valid on the device
-(e.g., calling a host-only function), clang defers the diagnostics and only
-emits them if the function is actually reachable from device code. This avoids
-false errors in ``__host__ __device__`` functions that are only used on the
-host side.
+device code. When such a function contains operations not valid on one side
+(e.g., calling a host-only function from device code), clang defers the
+diagnostics and only emits them if the function is actually reachable from a
+caller where the operation cannot be emitted. This avoids false errors in
+``__host__ __device__`` functions that are only used on the other side.
 
-For a detailed description of deferred diagnostics, device-promoted functions,
+For a detailed description of deferred diagnostics, HD-promoted functions,
 and call chain notes, see the
 `HIP Support 
<https://clang.llvm.org/docs/HIPSupport.html#deferred-diagnostics>`_
 documentation. The same mechanism applies to both CUDA and HIP.

_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [llvm] [CUDA/HIP][SYCL] Deduplicate deferred diagnostics across multiple callers (PR #185926)

Reply via email to