https://github.com/xroche updated 
https://github.com/llvm/llvm-project/pull/199351

>From cf30821b4da56f53854ceb81903dd276ffbf68fe Mon Sep 17 00:00:00 2001
From: Xavier Roche <[email protected]>
Date: Sat, 23 May 2026 15:37:12 +0200
Subject: [PATCH 1/5] [Clang] Forward incoming Indirect parameters across
 musttail calls

When a C function passes a struct by value to a musttail callee,
Clang's frontend implements "by value" with a local
alloca + memcpy + pass-pointer pattern (the byval-temp). The alloca
lives in the caller's frame, which the tail call deallocates before
the callee dereferences the pointer. The callee reads freed stack
memory. Reproduces on RV64, AArch64, ARM, LoongArch64, and SystemZ at
every optimization level.

This is the argument-side analog of the SRet forwarding fix in
a96c14eeb8fc ("[Clang] Always forward sret parameters to musttail
calls", Kiran 2024-08-19). For musttail calls in the
ABIArgInfo::Indirect case, when the call argument's source LValue
resolves to a forwarded incoming Indirect parameter of the current
function with a matching ABI shape, forward the incoming llvm::Argument
directly instead of creating a byval-temp. Falls through to the
existing byval-temp path for any other source.

Three safety guards in the helper:
- ABI-attribute match (Verifier V7). Refuse to forward when the
  incoming parameter and the call slot disagree on byval.
- noalias deduplication. If the user writes
  `musttail callee(a, a)` with `a` a noalias Indirect parameter,
  do not forward the same Argument to both slots; pre-fix gave two
  distinct allocas and aliasing must not regress.
- AddrSpaceCast peek-through. EmitParmDecl wraps incoming Indirect
  parameters in addrspacecast on NVPTX/AMDGPU/SPIR. Peek through one
  cast; do NOT unwrap loads (a load through a local alloca means the
  source is a local and the fix must not engage).

Scope: this PR fixes the C source case. C++ source for the same
construct routes through CXXConstructExpr + EmitAnyExprToTemp which
materializes an agg.tmp before EmitCall runs. The fix correctly falls
through in that case (the source is the local alloca, not the
Argument) but does not eliminate the dangle. A follow-up PR will plumb
IsMustTail through EmitCallArg to cover the C++ case.

Test: clang/test/CodeGen/musttail-indirect-arg.c covers plain forward,
two-arg forward, swapped args, mixed direct+indirect, modify-then-
forward, and negative cases (local source, computed copy, non-
musttail). Runs on riscv64, aarch64, loongarch64, s390x.

Fixes #56908. Helps #116568 #157814 #46402 #190429 #56435 #72555.
Complement of the backend fix in #185094 (RISC-V) and 0be65bac6907
(LoongArch).

Assisted-by: Claude (Anthropic)
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
---
 clang/lib/CodeGen/CGCall.cpp               | 86 +++++++++++++++++++++
 clang/test/CodeGen/musttail-indirect-arg.c | 90 ++++++++++++++++++++++
 2 files changed, 176 insertions(+)
 create mode 100644 clang/test/CodeGen/musttail-indirect-arg.c

diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 40cc275d40273..89aa89bfb26a4 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -5448,6 +5448,69 @@ static unsigned getMaxVectorWidth(const llvm::Type *Ty) {
   return MaxVectorWidth;
 }
 
+/// For a musttail call argument lowered as ABIArgInfo::Indirect, returns the
+/// incoming llvm::Argument of the current function when the call argument's
+/// source is a forwarded incoming Indirect parameter with a matching ABI
+/// shape. Returns nullptr to fall through to the normal byval-temp path.
+///
+/// Forwarding is safe under musttail's prototype-match invariant: the
+/// incoming pointer points into the caller's caller's frame and stays valid
+/// across the tail call, whereas a local alloca would dangle. This mirrors
+/// the SRet forwarding in the return path (see commit a96c14eeb8fc,
+/// "Always forward sret parameters to musttail calls").
+///
+/// Guards:
+///   - The source LValue must be the IR-level Argument of CurFn (peek through
+///     one AddrSpaceCastInst for non-default alloca address spaces; do NOT
+///     unwrap loads, since a load through a local alloca means the source
+///     IS a local).
+///   - The incoming parameter must be passed indirectly with byval-ness
+///     matching the call slot (Verifier V7).
+///   - The Argument must not already have been forwarded by a sibling call
+///     argument in this same call (noalias deduplication).
+static llvm::Argument *getForwardableIncomingMustTailArg(
+    CodeGenFunction &CGF, const CallArg &CallArgument,
+    const ABIArgInfo &CallSlotInfo,
+    llvm::SmallPtrSetImpl<llvm::Argument *> &AlreadyForwarded) {
+  // The call argument can be either an LValue (DeclRefExpr to a parameter)
+  // or an RValue aggregate (typical for struct args lowered by CGCall). Both
+  // expose the underlying address; we just need the IR-level pointer.
+  Address SrcAddr = Address::invalid();
+  if (CallArgument.hasLValue())
+    SrcAddr = CallArgument.getKnownLValue().getAddress();
+  else if (CallArgument.getKnownRValue().isAggregate())
+    SrcAddr = CallArgument.getKnownRValue().getAggregateAddress();
+  else
+    return nullptr;
+  llvm::Value *SrcPtr = SrcAddr.emitRawPointer(CGF);
+
+  // Peek through one AddrSpaceCastInst. EmitParmDecl wraps incoming Indirect
+  // parameters in addrspacecast on targets whose alloca address space differs
+  // from the parameter's pointer address space (NVPTX / AMDGPU / SPIR).
+  if (auto *ASC = llvm::dyn_cast<llvm::AddrSpaceCastInst>(SrcPtr))
+    SrcPtr = ASC->getOperand(0);
+
+  auto *IncomingArg = llvm::dyn_cast<llvm::Argument>(SrcPtr);
+  if (!IncomingArg || IncomingArg->getParent() != CGF.CurFn)
+    return nullptr;
+
+  // byval-ness must match between the incoming parameter and the call slot.
+  // The Verifier rejects musttail across an ABI-attribute mismatch (V7), so
+  // producing IR with a mismatch is a verification failure. Falling through
+  // to byval-temp is the safe behavior.
+  if (IncomingArg->hasByValAttr() != CallSlotInfo.getIndirectByVal())
+    return nullptr;
+
+  // noalias deduplication: a noalias incoming parameter must not be
+  // forwarded to two slots in the same call. Pre-fix, each slot got its
+  // own byval-temp; we must not regress that aliasing guarantee.
+  if (IncomingArg->hasNoAliasAttr() &&
+      !AlreadyForwarded.insert(IncomingArg).second)
+    return nullptr;
+
+  return IncomingArg;
+}
+
 RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
                                  const CGCallee &Callee,
                                  ReturnValueSlot ReturnValue,
@@ -5571,6 +5634,12 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo 
&CallInfo,
   // markers that need to be ended right after the call.
   SmallVector<CallLifetimeEnd, 2> CallLifetimeEndAfterCall;
 
+  // For musttail calls forwarding Indirect parameters: tracks incoming
+  // Arguments already forwarded to a slot in this call, so a noalias
+  // incoming Argument is not forwarded to two slots (see
+  // getForwardableIncomingMustTailArg).
+  llvm::SmallPtrSet<llvm::Argument *, 4> ForwardedMustTailArgs;
+
   // Translate all of the arguments as necessary to match the IR lowering.
   assert(CallInfo.arg_size() == CallArgs.size() &&
          "Mismatch between function signature & arguments.");
@@ -5643,6 +5712,23 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo 
&CallInfo,
     case ABIArgInfo::Indirect:
     case ABIArgInfo::IndirectAliased: {
       assert(NumIRArgs == 1);
+
+      // For musttail calls, forward an incoming Indirect parameter directly
+      // instead of creating a byval-temp. A local alloca would be deallocated
+      // by the tail call before the callee dereferences the pointer. The
+      // incoming pointer points into the caller's caller's frame, which
+      // remains valid. Mirrors the SRet forwarding above (a96c14eeb8fc).
+      if (IsMustTail) {
+        if (llvm::Argument *FwdArg = getForwardableIncomingMustTailArg(
+                *this, *I, ArgInfo, ForwardedMustTailArgs)) {
+          llvm::Value *Val = FwdArg;
+          if (ArgHasMaybeUndefAttr)
+            Val = Builder.CreateFreeze(Val);
+          IRCallArgs[FirstIRArg] = Val;
+          break;
+        }
+      }
+
       if (I->isAggregate()) {
         // We want to avoid creating an unnecessary temporary+copy here;
         // however, we need one in three cases:
diff --git a/clang/test/CodeGen/musttail-indirect-arg.c 
b/clang/test/CodeGen/musttail-indirect-arg.c
new file mode 100644
index 0000000000000..70ef193493e27
--- /dev/null
+++ b/clang/test/CodeGen/musttail-indirect-arg.c
@@ -0,0 +1,90 @@
+// Test that Clang forwards incoming Indirect parameters across musttail calls
+// instead of creating a byval-temp alloca that would dangle after the tail 
call
+// deallocates the caller's frame.
+//
+// Companion to musttail-sret.cpp (commit a96c14eeb8fc): same idea, applied to
+// incoming arguments rather than the sret return slot.
+
+// RUN: %clang_cc1 -triple=riscv64-linux-gnu %s -emit-llvm -O1 -o - | 
FileCheck %s --check-prefix=COMMON
+// RUN: %clang_cc1 -triple=aarch64-linux-gnu %s -emit-llvm -O1 -o - | 
FileCheck %s --check-prefix=COMMON
+// RUN: %clang_cc1 -triple=loongarch64-linux-gnu %s -emit-llvm -O1 -o - | 
FileCheck %s --check-prefix=COMMON
+// RUN: %clang_cc1 -triple=s390x-linux-gnu %s -emit-llvm -O1 -o - | FileCheck 
%s --check-prefix=COMMON
+
+// A struct large enough to land on the indirect-arg path on RV64 (>2*XLEN=16
+// bytes), AArch64 (>16 bytes), LoongArch64, SystemZ.
+struct Big {
+  unsigned long long a, b, c, d;
+};
+
+// Plain forward: caller(B) musttails callee(B). The fix should emit no
+// byval-temp alloca; the call should forward the incoming parameter %a.
+struct Big C1(struct Big a);
+struct Big P1(struct Big a) {
+  __attribute__((musttail)) return C1(a);
+}
+// COMMON-LABEL: define {{.*}} @P1(
+// COMMON-NOT: alloca %struct.Big
+// COMMON: musttail call {{.*}} @C1({{.*}} %a
+
+// Two indirect args, same forwarding: each forwards its own incoming param.
+struct Big C2(struct Big a, struct Big b);
+struct Big P2(struct Big a, struct Big b) {
+  __attribute__((musttail)) return C2(a, b);
+}
+// COMMON-LABEL: define {{.*}} @P2(
+// COMMON-NOT: alloca %struct.Big
+// COMMON: musttail call {{.*}} @C2({{.*}} %a, {{.*}} %b
+
+// Swapped args: caller(a, b) musttails callee(b, a). Each forwarded slot
+// must resolve to the correct incoming Argument, not by position.
+struct Big C3(struct Big x, struct Big y);
+struct Big P3(struct Big a, struct Big b) {
+  __attribute__((musttail)) return C3(b, a);
+}
+// COMMON-LABEL: define {{.*}} @P3(
+// COMMON-NOT: alloca %struct.Big
+// COMMON: musttail call {{.*}} @C3({{.*}} %b, {{.*}} %a
+
+// Mixed direct + indirect: only the indirect arg is affected by the fix.
+struct Big C4(int n, struct Big a);
+struct Big P4(int n, struct Big a) {
+  __attribute__((musttail)) return C4(n, a);
+}
+// COMMON-LABEL: define {{.*}} @P4(
+// COMMON-NOT: alloca %struct.Big
+// COMMON: musttail call {{.*}} @C4({{.*}} %n, {{.*}} %a
+
+// Negative: local source. Caller takes Big a, but musttails with a LOCAL
+// Big initialized in caller's frame. The byval-temp must remain because the
+// source lives in caller's frame and would dangle if forwarded. The fix
+// must NOT engage in this case.
+struct Big C5(struct Big a);
+struct Big P5(struct Big a) {
+  struct Big local = {1, 2, 3, 4};
+  __attribute__((musttail)) return C5(local);
+}
+// COMMON-LABEL: define {{.*}} @P5(
+// COMMON: alloca
+// COMMON: musttail call {{.*}} @C5(
+
+// Negative: computed value (caller modifies the parameter then musttails).
+// The IR will use %a directly (Clang lowers writes through the incoming
+// pointer for Indirect params) so the fix does engage on the formal param,
+// but a fresh alloca is not created either way -- existing behavior.
+struct Big C6(struct Big a);
+struct Big P6(struct Big a) {
+  a.a += 1;
+  __attribute__((musttail)) return C6(a);
+}
+// COMMON-LABEL: define {{.*}} @P6(
+// COMMON-NOT: alloca %struct.Big
+// COMMON: musttail call {{.*}} @C6({{.*}} %a
+
+// Non-musttail tail call: the fix must NOT engage. Existing path emits
+// the byval-temp as before.
+struct Big C7(struct Big a);
+struct Big P7(struct Big a) {
+  return C7(a);
+}
+// COMMON-LABEL: define {{.*}} @P7(
+// COMMON-NOT: musttail

>From a1c3fd3e175cd6c26c3868460c03b931382099f9 Mon Sep 17 00:00:00 2001
From: Xavier Roche <[email protected]>
Date: Sat, 23 May 2026 16:18:46 +0200
Subject: [PATCH 2/5] [Clang][NFC] Trim comments on musttail Indirect
 forwarding helper

Reduce comment volume on getForwardableIncomingMustTailArg and the
call site to match the SRet precedent (a96c14eeb8fc). Same code,
shorter comments.

Assisted-by: Claude (Anthropic)
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
---
 clang/lib/CodeGen/CGCall.cpp | 56 +++++++++---------------------------
 1 file changed, 14 insertions(+), 42 deletions(-)

diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index 89aa89bfb26a4..b130df61c18b8 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -5448,33 +5448,14 @@ static unsigned getMaxVectorWidth(const llvm::Type *Ty) 
{
   return MaxVectorWidth;
 }
 
-/// For a musttail call argument lowered as ABIArgInfo::Indirect, returns the
-/// incoming llvm::Argument of the current function when the call argument's
-/// source is a forwarded incoming Indirect parameter with a matching ABI
-/// shape. Returns nullptr to fall through to the normal byval-temp path.
-///
-/// Forwarding is safe under musttail's prototype-match invariant: the
-/// incoming pointer points into the caller's caller's frame and stays valid
-/// across the tail call, whereas a local alloca would dangle. This mirrors
-/// the SRet forwarding in the return path (see commit a96c14eeb8fc,
-/// "Always forward sret parameters to musttail calls").
-///
-/// Guards:
-///   - The source LValue must be the IR-level Argument of CurFn (peek through
-///     one AddrSpaceCastInst for non-default alloca address spaces; do NOT
-///     unwrap loads, since a load through a local alloca means the source
-///     IS a local).
-///   - The incoming parameter must be passed indirectly with byval-ness
-///     matching the call slot (Verifier V7).
-///   - The Argument must not already have been forwarded by a sibling call
-///     argument in this same call (noalias deduplication).
+/// Returns the incoming llvm::Argument of the current function if this
+/// musttail call argument forwards an incoming Indirect parameter with a
+/// matching ABI shape; nullptr to fall through to the byval-temp path.
+/// Argument-side analog of a96c14eeb8fc (SRet musttail forwarding).
 static llvm::Argument *getForwardableIncomingMustTailArg(
     CodeGenFunction &CGF, const CallArg &CallArgument,
     const ABIArgInfo &CallSlotInfo,
     llvm::SmallPtrSetImpl<llvm::Argument *> &AlreadyForwarded) {
-  // The call argument can be either an LValue (DeclRefExpr to a parameter)
-  // or an RValue aggregate (typical for struct args lowered by CGCall). Both
-  // expose the underlying address; we just need the IR-level pointer.
   Address SrcAddr = Address::invalid();
   if (CallArgument.hasLValue())
     SrcAddr = CallArgument.getKnownLValue().getAddress();
@@ -5484,9 +5465,9 @@ static llvm::Argument *getForwardableIncomingMustTailArg(
     return nullptr;
   llvm::Value *SrcPtr = SrcAddr.emitRawPointer(CGF);
 
-  // Peek through one AddrSpaceCastInst. EmitParmDecl wraps incoming Indirect
-  // parameters in addrspacecast on targets whose alloca address space differs
-  // from the parameter's pointer address space (NVPTX / AMDGPU / SPIR).
+  // Peek through one AddrSpaceCastInst (NVPTX / AMDGPU / SPIR wrap incoming
+  // Indirect params via EmitParmDecl). Do not unwrap loads: a load through
+  // a local alloca means the source is a local.
   if (auto *ASC = llvm::dyn_cast<llvm::AddrSpaceCastInst>(SrcPtr))
     SrcPtr = ASC->getOperand(0);
 
@@ -5494,16 +5475,11 @@ static llvm::Argument 
*getForwardableIncomingMustTailArg(
   if (!IncomingArg || IncomingArg->getParent() != CGF.CurFn)
     return nullptr;
 
-  // byval-ness must match between the incoming parameter and the call slot.
-  // The Verifier rejects musttail across an ABI-attribute mismatch (V7), so
-  // producing IR with a mismatch is a verification failure. Falling through
-  // to byval-temp is the safe behavior.
+  // Verifier V7 requires matching ABI attributes across musttail.
   if (IncomingArg->hasByValAttr() != CallSlotInfo.getIndirectByVal())
     return nullptr;
 
-  // noalias deduplication: a noalias incoming parameter must not be
-  // forwarded to two slots in the same call. Pre-fix, each slot got its
-  // own byval-temp; we must not regress that aliasing guarantee.
+  // Do not forward the same noalias Argument to two slots in one call.
   if (IncomingArg->hasNoAliasAttr() &&
       !AlreadyForwarded.insert(IncomingArg).second)
     return nullptr;
@@ -5634,10 +5610,8 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo 
&CallInfo,
   // markers that need to be ended right after the call.
   SmallVector<CallLifetimeEnd, 2> CallLifetimeEndAfterCall;
 
-  // For musttail calls forwarding Indirect parameters: tracks incoming
-  // Arguments already forwarded to a slot in this call, so a noalias
-  // incoming Argument is not forwarded to two slots (see
-  // getForwardableIncomingMustTailArg).
+  // Tracks incoming Arguments already forwarded by a musttail Indirect arg,
+  // for noalias deduplication in getForwardableIncomingMustTailArg.
   llvm::SmallPtrSet<llvm::Argument *, 4> ForwardedMustTailArgs;
 
   // Translate all of the arguments as necessary to match the IR lowering.
@@ -5713,11 +5687,9 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo 
&CallInfo,
     case ABIArgInfo::IndirectAliased: {
       assert(NumIRArgs == 1);
 
-      // For musttail calls, forward an incoming Indirect parameter directly
-      // instead of creating a byval-temp. A local alloca would be deallocated
-      // by the tail call before the callee dereferences the pointer. The
-      // incoming pointer points into the caller's caller's frame, which
-      // remains valid. Mirrors the SRet forwarding above (a96c14eeb8fc).
+      // For musttail, forward an incoming Indirect parameter directly. A
+      // local alloca would dangle after the tail call. Mirrors the SRet
+      // forwarding above (a96c14eeb8fc).
       if (IsMustTail) {
         if (llvm::Argument *FwdArg = getForwardableIncomingMustTailArg(
                 *this, *I, ArgInfo, ForwardedMustTailArgs)) {

>From 11cdbbffa6ae25864aafafedc745cac47c8872ab Mon Sep 17 00:00:00 2001
From: Xavier Roche <[email protected]>
Date: Sat, 23 May 2026 18:16:50 +0200
Subject: [PATCH 3/5] [Clang] Address review on musttail Indirect forwarding

- Restrict the helper to ABIArgInfo::Indirect (not IndirectAliased).
  CallSlotInfo.getIndirectByVal() asserts on IndirectAliased; falling
  through to the byval-temp path is the safe behavior for that case.
- Tighten CHECK patterns: anchor the forwarded operand by IR name
  (e.g. `@C1({{.*}} %a)`) and exclude `alloca [32 x i8]` in addition
  to `alloca %struct.Big` so a future ABI change that picks a
  different temp type cannot mask a regression.
- Add P5 cross-BB musttail (musttail behind a branch) and P7 same-
  Argument-to-two-slots (pins the noalias dedup behavior under the
  Linux C ABI where incoming Indirect params are not noalias).
- Reword P6 (caller modifies the parameter then musttails): this is
  a positive case, not a negative one. The fix correctly engages and
  eliminates the byval-temp.

No functional change to the fix shape; reviewer-flagged hardening.

Assisted-by: Claude (Anthropic)
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
---
 clang/lib/CodeGen/CGCall.cpp               |  6 +-
 clang/test/CodeGen/musttail-indirect-arg.c | 92 ++++++++++++++--------
 2 files changed, 64 insertions(+), 34 deletions(-)

diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index b130df61c18b8..ad402c459b892 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -5689,8 +5689,10 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo 
&CallInfo,
 
       // For musttail, forward an incoming Indirect parameter directly. A
       // local alloca would dangle after the tail call. Mirrors the SRet
-      // forwarding above (a96c14eeb8fc).
-      if (IsMustTail) {
+      // forwarding above (a96c14eeb8fc). Limited to Indirect (not
+      // IndirectAliased) so the byval-ness check in the helper does not
+      // assert on getIndirectByVal().
+      if (IsMustTail && ArgInfo.isIndirect()) {
         if (llvm::Argument *FwdArg = getForwardableIncomingMustTailArg(
                 *this, *I, ArgInfo, ForwardedMustTailArgs)) {
           llvm::Value *Val = FwdArg;
diff --git a/clang/test/CodeGen/musttail-indirect-arg.c 
b/clang/test/CodeGen/musttail-indirect-arg.c
index 70ef193493e27..0e7493f9e38b7 100644
--- a/clang/test/CodeGen/musttail-indirect-arg.c
+++ b/clang/test/CodeGen/musttail-indirect-arg.c
@@ -17,14 +17,16 @@ struct Big {
 };
 
 // Plain forward: caller(B) musttails callee(B). The fix should emit no
-// byval-temp alloca; the call should forward the incoming parameter %a.
+// alloca for the forwarded arg; the call should forward the incoming
+// parameter %a.
 struct Big C1(struct Big a);
 struct Big P1(struct Big a) {
   __attribute__((musttail)) return C1(a);
 }
 // COMMON-LABEL: define {{.*}} @P1(
-// COMMON-NOT: alloca %struct.Big
-// COMMON: musttail call {{.*}} @C1({{.*}} %a
+// COMMON-NOT: = alloca {{.*}}struct.Big
+// COMMON-NOT: = alloca [32 x i8]
+// COMMON: musttail call {{.*}} @C1({{.*}} %a)
 
 // Two indirect args, same forwarding: each forwards its own incoming param.
 struct Big C2(struct Big a, struct Big b);
@@ -32,8 +34,8 @@ struct Big P2(struct Big a, struct Big b) {
   __attribute__((musttail)) return C2(a, b);
 }
 // COMMON-LABEL: define {{.*}} @P2(
-// COMMON-NOT: alloca %struct.Big
-// COMMON: musttail call {{.*}} @C2({{.*}} %a, {{.*}} %b
+// COMMON-NOT: = alloca {{.*}}struct.Big
+// COMMON: musttail call {{.*}} @C2({{.*}} %a, {{.*}} %b)
 
 // Swapped args: caller(a, b) musttails callee(b, a). Each forwarded slot
 // must resolve to the correct incoming Argument, not by position.
@@ -42,8 +44,8 @@ struct Big P3(struct Big a, struct Big b) {
   __attribute__((musttail)) return C3(b, a);
 }
 // COMMON-LABEL: define {{.*}} @P3(
-// COMMON-NOT: alloca %struct.Big
-// COMMON: musttail call {{.*}} @C3({{.*}} %b, {{.*}} %a
+// COMMON-NOT: = alloca {{.*}}struct.Big
+// COMMON: musttail call {{.*}} @C3({{.*}} %b, {{.*}} %a)
 
 // Mixed direct + indirect: only the indirect arg is affected by the fix.
 struct Big C4(int n, struct Big a);
@@ -51,40 +53,66 @@ struct Big P4(int n, struct Big a) {
   __attribute__((musttail)) return C4(n, a);
 }
 // COMMON-LABEL: define {{.*}} @P4(
-// COMMON-NOT: alloca %struct.Big
-// COMMON: musttail call {{.*}} @C4({{.*}} %n, {{.*}} %a
+// COMMON-NOT: = alloca {{.*}}struct.Big
+// COMMON: musttail call {{.*}} @C4({{.*}} %n, {{.*}} %a)
 
-// Negative: local source. Caller takes Big a, but musttails with a LOCAL
-// Big initialized in caller's frame. The byval-temp must remain because the
-// source lives in caller's frame and would dangle if forwarded. The fix
-// must NOT engage in this case.
+// Caller modifies the parameter before the musttail. Clang lowers the
+// write through the incoming pointer, and the fix forwards the same
+// pointer to the callee. No byval-temp.
 struct Big C5(struct Big a);
 struct Big P5(struct Big a) {
-  struct Big local = {1, 2, 3, 4};
-  __attribute__((musttail)) return C5(local);
+  a.a += 1;
+  __attribute__((musttail)) return C5(a);
 }
 // COMMON-LABEL: define {{.*}} @P5(
-// COMMON: alloca
-// COMMON: musttail call {{.*}} @C5(
+// COMMON-NOT: = alloca {{.*}}struct.Big
+// COMMON: musttail call {{.*}} @C5({{.*}} %a)
 
-// Negative: computed value (caller modifies the parameter then musttails).
-// The IR will use %a directly (Clang lowers writes through the incoming
-// pointer for Indirect params) so the fix does engage on the formal param,
-// but a fresh alloca is not created either way -- existing behavior.
-struct Big C6(struct Big a);
-struct Big P6(struct Big a) {
-  a.a += 1;
-  __attribute__((musttail)) return C6(a);
+// musttail behind a branch: the forwarded pointer must remain live across
+// the basic block transition. Tests that the helper does not assume the
+// musttail is in the entry block.
+struct Big C6(struct Big a, int cond);
+struct Big P6(struct Big a, int cond) {
+  if (cond)
+    __attribute__((musttail)) return C6(a, cond);
+  return a;
 }
 // COMMON-LABEL: define {{.*}} @P6(
-// COMMON-NOT: alloca %struct.Big
-// COMMON: musttail call {{.*}} @C6({{.*}} %a
+// COMMON-NOT: = alloca {{.*}}struct.Big
+// COMMON: musttail call {{.*}} @C6({{.*}} %a,
 
-// Non-musttail tail call: the fix must NOT engage. Existing path emits
-// the byval-temp as before.
-struct Big C7(struct Big a);
-struct Big P7(struct Big a) {
-  return C7(a);
+// Same Argument forwarded to two slots: the helper engages for both. The
+// noalias deduplication, if it ever fired, would force the second slot
+// back to a byval-temp; but incoming Indirect params under the Linux C
+// ABI are not noalias, so both slots forward %a directly. This pins the
+// behavior so a future change introducing noalias on Indirect params
+// would surface here. (musttail requires matching prototypes, so caller
+// and callee both take two Big args.)
+struct Big C7(struct Big x, struct Big y);
+struct Big P7(struct Big a, struct Big b) {
+  __attribute__((musttail)) return C7(a, a);
 }
 // COMMON-LABEL: define {{.*}} @P7(
+// COMMON-NOT: = alloca {{.*}}struct.Big
+// COMMON: musttail call {{.*}} @C7({{.*}} %a, {{.*}} %a)
+
+// Negative: local source. Caller takes Big a, but musttails with a LOCAL
+// Big initialized in caller's frame. The byval-temp must remain because the
+// source lives in caller's frame and would dangle if forwarded.
+struct Big C8(struct Big a);
+struct Big P8(struct Big a) {
+  struct Big local = {1, 2, 3, 4};
+  __attribute__((musttail)) return C8(local);
+}
+// COMMON-LABEL: define {{.*}} @P8(
+// COMMON: = alloca
+// COMMON: musttail call {{.*}} @C8(
+
+// Non-musttail tail call: the fix must NOT engage. Existing path emits
+// the byval-temp as before, no musttail in the IR.
+struct Big C9(struct Big a);
+struct Big P9(struct Big a) {
+  return C9(a);
+}
+// COMMON-LABEL: define {{.*}} @P9(
 // COMMON-NOT: musttail

>From 43590837d6a8bf2d8120be835dc0cfd2817d9822 Mon Sep 17 00:00:00 2001
From: Xavier Roche <[email protected]>
Date: Sat, 23 May 2026 21:20:58 +0200
Subject: [PATCH 4/5] [Clang] Extend musttail Indirect forwarding to C++
 trivial-copy args

For C++ struct-by-value arguments, the call argument is a
CXXConstructExpr invoking the implicit copy constructor, not the
LValueToRValue cast that EmitCallArg uses for C. The trivial copy
materializes an agg.tmp before EmitCall runs, so the helper in
EmitCall (getForwardableIncomingMustTailArg) sees a local alloca and
falls through.

Detect the trivial-copy CXXConstructExpr whose source is a
ParmVarDecl of the current function inside a musttail call (signaled
by CodeGenFunction::MustTailCall being non-null) and take the
addUncopiedAggregate path so the LValue of the original parameter
reaches EmitCall. The forwarding helper then forwards the incoming
Argument, no agg.tmp.

Limited to trivially-copyable types so we never elide observable
copy constructors or skip destruction. Non-trivial copy ctors / dtors
still take the existing materialize-then-copy path.

Test: clang/test/CodeGen/musttail-indirect-arg.cpp (companion to the
C test) covers plain forward, two-arg, swapped, and verifies the
trivial-copy elision does NOT engage for non-trivial copy ctors or
non-musttail tail calls.

clang/test/CodeGen + CodeGenCXX (7307 tests) clean. Runtime cross-arch
sweep on a C++ minimal repro passes on x86_64, riscv64, aarch64, arm,
loongarch64, s390x; the same repro fails on RV64/AArch64 without this
commit, confirming the new path catches the C++ bug shape.

Fixes the C++ scope hole called out in the prior commit
(11cdbbffa6ae).

Assisted-by: Claude (Anthropic)
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
---
 clang/lib/CodeGen/CGCall.cpp                 | 24 +++++++
 clang/test/CodeGen/musttail-indirect-arg.cpp | 73 ++++++++++++++++++++
 2 files changed, 97 insertions(+)
 create mode 100644 clang/test/CodeGen/musttail-indirect-arg.cpp

diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index ad402c459b892..c34c44571b33d 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -5139,6 +5139,30 @@ void CodeGenFunction::EmitCallArg(CallArgList &args, 
const Expr *E,
     return;
   }
 
+  // For musttail calls in C++, skip the agg.tmp copy when the source is a
+  // trivially-copyable parameter of the current function. The copy would
+  // live in our frame, which the tail call deallocates before the callee
+  // dereferences it. The companion fix in EmitCall (getForwardableIncoming
+  // MustTailArg) forwards the incoming Indirect Argument; for that to fire
+  // here we must hand it the original parameter LValue, not a fresh temp.
+  if (HasAggregateEvalKind && MustTailCall && type->isRecordType() &&
+      type.isTriviallyCopyableType(getContext())) {
+    if (const auto *CCE = dyn_cast<CXXConstructExpr>(E)) {
+      if (CCE->getConstructor()->isCopyOrMoveConstructor() &&
+          CCE->getConstructor()->isTrivial() && CCE->getNumArgs() == 1) {
+        const Expr *Source = CCE->getArg(0)->IgnoreParenImpCasts();
+        if (const auto *DRE = dyn_cast<DeclRefExpr>(Source))
+          if (const auto *PVD = dyn_cast<ParmVarDecl>(DRE->getDecl()))
+            if (PVD->getDeclContext() == dyn_cast<DeclContext>(CurCodeDecl)) {
+              LValue L = EmitLValue(DRE);
+              assert(L.isSimple());
+              args.addUncopiedAggregate(L, type);
+              return;
+            }
+      }
+    }
+  }
+
   args.add(EmitAnyExprToTemp(E), type);
 }
 
diff --git a/clang/test/CodeGen/musttail-indirect-arg.cpp 
b/clang/test/CodeGen/musttail-indirect-arg.cpp
new file mode 100644
index 0000000000000..a161f979b5a4c
--- /dev/null
+++ b/clang/test/CodeGen/musttail-indirect-arg.cpp
@@ -0,0 +1,73 @@
+// Test that Clang forwards incoming Indirect parameters across musttail calls
+// for C++ struct-by-value arguments with trivially-copyable types. Companion 
to
+// musttail-indirect-arg.c (the C side of the same fix) and musttail-sret.cpp
+// (the SRet precedent in a96c14eeb8fc).
+//
+// C++ goes through a different EmitCallArg path than C: the call argument is a
+// CXXConstructExpr invoking the implicit copy constructor, which would
+// otherwise materialize an agg.tmp before EmitCall. For musttail with a
+// trivially-copyable parameter forwarded directly, the copy is elided so the
+// helper in EmitCall can forward the incoming llvm::Argument.
+
+// RUN: %clang_cc1 -triple=riscv64-linux-gnu %s -emit-llvm -O1 -o - | 
FileCheck %s --check-prefix=COMMON
+// RUN: %clang_cc1 -triple=aarch64-linux-gnu %s -emit-llvm -O1 -o - | 
FileCheck %s --check-prefix=COMMON
+// RUN: %clang_cc1 -triple=loongarch64-linux-gnu %s -emit-llvm -O1 -o - | 
FileCheck %s --check-prefix=COMMON
+// RUN: %clang_cc1 -triple=s390x-linux-gnu %s -emit-llvm -O1 -o - | FileCheck 
%s --check-prefix=COMMON
+
+// A trivially-copyable struct large enough to land on the indirect-arg path
+// on RV64, AArch64, LoongArch64, SystemZ.
+struct Big {
+  unsigned long long a, b, c, d;
+};
+
+// Plain forward: caller(B) musttails callee(B). No agg.tmp copy, no
+// byval-temp; the incoming parameter %a is forwarded directly.
+struct Big C1(struct Big a);
+struct Big P1(struct Big a) {
+  [[clang::musttail]] return C1(a);
+}
+// COMMON-LABEL: define {{.*}} @_Z2P13Big(
+// COMMON-NOT: = alloca {{.*}}struct.Big
+// COMMON: musttail call {{.*}} @_Z2C13Big({{.*}} %a)
+
+// Two args, same forwarding.
+struct Big C2(struct Big a, struct Big b);
+struct Big P2(struct Big a, struct Big b) {
+  [[clang::musttail]] return C2(a, b);
+}
+// COMMON-LABEL: define {{.*}} @_Z2P23BigS_(
+// COMMON-NOT: = alloca {{.*}}struct.Big
+// COMMON: musttail call {{.*}} @_Z2C23BigS_({{.*}} %a, {{.*}} %b)
+
+// Swapped args.
+struct Big C3(struct Big x, struct Big y);
+struct Big P3(struct Big a, struct Big b) {
+  [[clang::musttail]] return C3(b, a);
+}
+// COMMON-LABEL: define {{.*}} @_Z2P33BigS_(
+// COMMON-NOT: = alloca {{.*}}struct.Big
+// COMMON: musttail call {{.*}} @_Z2C33BigS_({{.*}} %b, {{.*}} %a)
+
+// Non-trivial copy constructor: the trivial-copy elision must NOT engage.
+// Existing path materializes the agg.tmp (the user-defined copy ctor has
+// observable behavior).
+struct NonTrivial {
+  unsigned long long parts[4];
+  NonTrivial(const NonTrivial &);
+};
+NonTrivial C4(NonTrivial a);
+NonTrivial P4(NonTrivial a) {
+  [[clang::musttail]] return C4(a);
+}
+// COMMON-LABEL: define {{.*}} @_Z2P410NonTrivial(
+// The user-defined copy ctor IS called (the agg.tmp pattern still happens):
+// COMMON: call {{.*}} @_ZN10NonTrivialC1ERKS_
+
+// Non-musttail tail call: trivial-copy elision must NOT engage; the regular
+// agg.tmp copy is still emitted.
+struct Big C5(struct Big a);
+struct Big P5(struct Big a) {
+  return C5(a);
+}
+// COMMON-LABEL: define {{.*}} @_Z2P53Big(
+// COMMON-NOT: musttail

>From 9b1aa8b1228237096222c71ecaa4ea73c5b5b96c Mon Sep 17 00:00:00 2001
From: Xavier Roche <[email protected]>
Date: Sat, 23 May 2026 21:37:39 +0200
Subject: [PATCH 5/5] [Clang][test] Mirror C test cases in
 musttail-indirect-arg.cpp

Add the C++ analogs of P5/P6/P7/P8 from the C test that the C++ test
was missing per code review:

- P5: modify-then-musttail (trivial-copy elision still engages).
- P6: cross-BB musttail (elision works across basic blocks).
- P7: same Argument forwarded to two slots (pins noalias dedup
  behavior; incoming Indirect params are not noalias under Linux C++).
- P8: source is a non-parameter local (elision must NOT engage,
  byval-temp pattern remains).

Existing non-musttail test renumbered P5 -> P9.

Closes the test-coverage gap flagged by the second review on commit
43590837d6a8.

Assisted-by: Claude (Anthropic)
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
---
 clang/test/CodeGen/musttail-indirect-arg.cpp | 52 ++++++++++++++++++--
 1 file changed, 49 insertions(+), 3 deletions(-)

diff --git a/clang/test/CodeGen/musttail-indirect-arg.cpp 
b/clang/test/CodeGen/musttail-indirect-arg.cpp
index a161f979b5a4c..4c3203ca12cf2 100644
--- a/clang/test/CodeGen/musttail-indirect-arg.cpp
+++ b/clang/test/CodeGen/musttail-indirect-arg.cpp
@@ -63,11 +63,57 @@ NonTrivial P4(NonTrivial a) {
 // The user-defined copy ctor IS called (the agg.tmp pattern still happens):
 // COMMON: call {{.*}} @_ZN10NonTrivialC1ERKS_
 
-// Non-musttail tail call: trivial-copy elision must NOT engage; the regular
-// agg.tmp copy is still emitted.
+// Caller modifies the parameter before the musttail. The trivial-copy
+// elision still engages because the source LValue is still the parameter;
+// any mutation flowed through the incoming pointer is observed by the
+// forwarded call.
 struct Big C5(struct Big a);
 struct Big P5(struct Big a) {
-  return C5(a);
+  a.a += 1;
+  [[clang::musttail]] return C5(a);
 }
 // COMMON-LABEL: define {{.*}} @_Z2P53Big(
+// COMMON-NOT: = alloca {{.*}}struct.Big
+// COMMON: musttail call {{.*}} @_Z2C53Big({{.*}} %a)
+
+// musttail behind a branch: trivial-copy elision must work across BBs.
+struct Big C6(struct Big a, int cond);
+struct Big P6(struct Big a, int cond) {
+  if (cond)
+    [[clang::musttail]] return C6(a, cond);
+  return a;
+}
+// COMMON-LABEL: define {{.*}} @_Z2P63Bigi(
+// COMMON-NOT: = alloca {{.*}}struct.Big
+// COMMON: musttail call {{.*}} @_Z2C63Bigi({{.*}} %a,
+
+// Same Argument forwarded to two slots: both engage. Incoming Indirect
+// params are not noalias under the Linux C++ ABI so the dedup in the
+// helper does not fire and both slots forward %a directly.
+struct Big C7(struct Big x, struct Big y);
+struct Big P7(struct Big a, struct Big b) {
+  [[clang::musttail]] return C7(a, a);
+}
+// COMMON-LABEL: define {{.*}} @_Z2P73BigS_(
+// COMMON-NOT: = alloca {{.*}}struct.Big
+// COMMON: musttail call {{.*}} @_Z2C73BigS_({{.*}} %a, {{.*}} %a)
+
+// Negative: source is a local, not a parameter. The trivial-copy elision
+// must NOT engage; the byval-temp pattern remains.
+struct Big C8(struct Big a);
+struct Big P8(struct Big a) {
+  struct Big local = {1, 2, 3, 4};
+  [[clang::musttail]] return C8(local);
+}
+// COMMON-LABEL: define {{.*}} @_Z2P83Big(
+// COMMON: = alloca
+// COMMON: musttail call {{.*}} @_Z2C83Big(
+
+// Non-musttail tail call: trivial-copy elision must NOT engage; the regular
+// agg.tmp copy is still emitted.
+struct Big C9(struct Big a);
+struct Big P9(struct Big a) {
+  return C9(a);
+}
+// COMMON-LABEL: define {{.*}} @_Z2P93Big(
 // COMMON-NOT: musttail

_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to