https://github.com/xroche updated https://github.com/llvm/llvm-project/pull/199351
>From cf30821b4da56f53854ceb81903dd276ffbf68fe Mon Sep 17 00:00:00 2001 From: Xavier Roche <[email protected]> Date: Sat, 23 May 2026 15:37:12 +0200 Subject: [PATCH 1/6] [Clang] Forward incoming Indirect parameters across musttail calls When a C function passes a struct by value to a musttail callee, Clang's frontend implements "by value" with a local alloca + memcpy + pass-pointer pattern (the byval-temp). The alloca lives in the caller's frame, which the tail call deallocates before the callee dereferences the pointer. The callee reads freed stack memory. Reproduces on RV64, AArch64, ARM, LoongArch64, and SystemZ at every optimization level. This is the argument-side analog of the SRet forwarding fix in a96c14eeb8fc ("[Clang] Always forward sret parameters to musttail calls", Kiran 2024-08-19). For musttail calls in the ABIArgInfo::Indirect case, when the call argument's source LValue resolves to a forwarded incoming Indirect parameter of the current function with a matching ABI shape, forward the incoming llvm::Argument directly instead of creating a byval-temp. Falls through to the existing byval-temp path for any other source. Three safety guards in the helper: - ABI-attribute match (Verifier V7). Refuse to forward when the incoming parameter and the call slot disagree on byval. - noalias deduplication. If the user writes `musttail callee(a, a)` with `a` a noalias Indirect parameter, do not forward the same Argument to both slots; pre-fix gave two distinct allocas and aliasing must not regress. - AddrSpaceCast peek-through. EmitParmDecl wraps incoming Indirect parameters in addrspacecast on NVPTX/AMDGPU/SPIR. Peek through one cast; do NOT unwrap loads (a load through a local alloca means the source is a local and the fix must not engage). Scope: this PR fixes the C source case. C++ source for the same construct routes through CXXConstructExpr + EmitAnyExprToTemp which materializes an agg.tmp before EmitCall runs. The fix correctly falls through in that case (the source is the local alloca, not the Argument) but does not eliminate the dangle. A follow-up PR will plumb IsMustTail through EmitCallArg to cover the C++ case. Test: clang/test/CodeGen/musttail-indirect-arg.c covers plain forward, two-arg forward, swapped args, mixed direct+indirect, modify-then- forward, and negative cases (local source, computed copy, non- musttail). Runs on riscv64, aarch64, loongarch64, s390x. Fixes #56908. Helps #116568 #157814 #46402 #190429 #56435 #72555. Complement of the backend fix in #185094 (RISC-V) and 0be65bac6907 (LoongArch). Assisted-by: Claude (Anthropic) Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> --- clang/lib/CodeGen/CGCall.cpp | 86 +++++++++++++++++++++ clang/test/CodeGen/musttail-indirect-arg.c | 90 ++++++++++++++++++++++ 2 files changed, 176 insertions(+) create mode 100644 clang/test/CodeGen/musttail-indirect-arg.c diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp index 40cc275d40273..89aa89bfb26a4 100644 --- a/clang/lib/CodeGen/CGCall.cpp +++ b/clang/lib/CodeGen/CGCall.cpp @@ -5448,6 +5448,69 @@ static unsigned getMaxVectorWidth(const llvm::Type *Ty) { return MaxVectorWidth; } +/// For a musttail call argument lowered as ABIArgInfo::Indirect, returns the +/// incoming llvm::Argument of the current function when the call argument's +/// source is a forwarded incoming Indirect parameter with a matching ABI +/// shape. Returns nullptr to fall through to the normal byval-temp path. +/// +/// Forwarding is safe under musttail's prototype-match invariant: the +/// incoming pointer points into the caller's caller's frame and stays valid +/// across the tail call, whereas a local alloca would dangle. This mirrors +/// the SRet forwarding in the return path (see commit a96c14eeb8fc, +/// "Always forward sret parameters to musttail calls"). +/// +/// Guards: +/// - The source LValue must be the IR-level Argument of CurFn (peek through +/// one AddrSpaceCastInst for non-default alloca address spaces; do NOT +/// unwrap loads, since a load through a local alloca means the source +/// IS a local). +/// - The incoming parameter must be passed indirectly with byval-ness +/// matching the call slot (Verifier V7). +/// - The Argument must not already have been forwarded by a sibling call +/// argument in this same call (noalias deduplication). +static llvm::Argument *getForwardableIncomingMustTailArg( + CodeGenFunction &CGF, const CallArg &CallArgument, + const ABIArgInfo &CallSlotInfo, + llvm::SmallPtrSetImpl<llvm::Argument *> &AlreadyForwarded) { + // The call argument can be either an LValue (DeclRefExpr to a parameter) + // or an RValue aggregate (typical for struct args lowered by CGCall). Both + // expose the underlying address; we just need the IR-level pointer. + Address SrcAddr = Address::invalid(); + if (CallArgument.hasLValue()) + SrcAddr = CallArgument.getKnownLValue().getAddress(); + else if (CallArgument.getKnownRValue().isAggregate()) + SrcAddr = CallArgument.getKnownRValue().getAggregateAddress(); + else + return nullptr; + llvm::Value *SrcPtr = SrcAddr.emitRawPointer(CGF); + + // Peek through one AddrSpaceCastInst. EmitParmDecl wraps incoming Indirect + // parameters in addrspacecast on targets whose alloca address space differs + // from the parameter's pointer address space (NVPTX / AMDGPU / SPIR). + if (auto *ASC = llvm::dyn_cast<llvm::AddrSpaceCastInst>(SrcPtr)) + SrcPtr = ASC->getOperand(0); + + auto *IncomingArg = llvm::dyn_cast<llvm::Argument>(SrcPtr); + if (!IncomingArg || IncomingArg->getParent() != CGF.CurFn) + return nullptr; + + // byval-ness must match between the incoming parameter and the call slot. + // The Verifier rejects musttail across an ABI-attribute mismatch (V7), so + // producing IR with a mismatch is a verification failure. Falling through + // to byval-temp is the safe behavior. + if (IncomingArg->hasByValAttr() != CallSlotInfo.getIndirectByVal()) + return nullptr; + + // noalias deduplication: a noalias incoming parameter must not be + // forwarded to two slots in the same call. Pre-fix, each slot got its + // own byval-temp; we must not regress that aliasing guarantee. + if (IncomingArg->hasNoAliasAttr() && + !AlreadyForwarded.insert(IncomingArg).second) + return nullptr; + + return IncomingArg; +} + RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo, const CGCallee &Callee, ReturnValueSlot ReturnValue, @@ -5571,6 +5634,12 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo, // markers that need to be ended right after the call. SmallVector<CallLifetimeEnd, 2> CallLifetimeEndAfterCall; + // For musttail calls forwarding Indirect parameters: tracks incoming + // Arguments already forwarded to a slot in this call, so a noalias + // incoming Argument is not forwarded to two slots (see + // getForwardableIncomingMustTailArg). + llvm::SmallPtrSet<llvm::Argument *, 4> ForwardedMustTailArgs; + // Translate all of the arguments as necessary to match the IR lowering. assert(CallInfo.arg_size() == CallArgs.size() && "Mismatch between function signature & arguments."); @@ -5643,6 +5712,23 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo, case ABIArgInfo::Indirect: case ABIArgInfo::IndirectAliased: { assert(NumIRArgs == 1); + + // For musttail calls, forward an incoming Indirect parameter directly + // instead of creating a byval-temp. A local alloca would be deallocated + // by the tail call before the callee dereferences the pointer. The + // incoming pointer points into the caller's caller's frame, which + // remains valid. Mirrors the SRet forwarding above (a96c14eeb8fc). + if (IsMustTail) { + if (llvm::Argument *FwdArg = getForwardableIncomingMustTailArg( + *this, *I, ArgInfo, ForwardedMustTailArgs)) { + llvm::Value *Val = FwdArg; + if (ArgHasMaybeUndefAttr) + Val = Builder.CreateFreeze(Val); + IRCallArgs[FirstIRArg] = Val; + break; + } + } + if (I->isAggregate()) { // We want to avoid creating an unnecessary temporary+copy here; // however, we need one in three cases: diff --git a/clang/test/CodeGen/musttail-indirect-arg.c b/clang/test/CodeGen/musttail-indirect-arg.c new file mode 100644 index 0000000000000..70ef193493e27 --- /dev/null +++ b/clang/test/CodeGen/musttail-indirect-arg.c @@ -0,0 +1,90 @@ +// Test that Clang forwards incoming Indirect parameters across musttail calls +// instead of creating a byval-temp alloca that would dangle after the tail call +// deallocates the caller's frame. +// +// Companion to musttail-sret.cpp (commit a96c14eeb8fc): same idea, applied to +// incoming arguments rather than the sret return slot. + +// RUN: %clang_cc1 -triple=riscv64-linux-gnu %s -emit-llvm -O1 -o - | FileCheck %s --check-prefix=COMMON +// RUN: %clang_cc1 -triple=aarch64-linux-gnu %s -emit-llvm -O1 -o - | FileCheck %s --check-prefix=COMMON +// RUN: %clang_cc1 -triple=loongarch64-linux-gnu %s -emit-llvm -O1 -o - | FileCheck %s --check-prefix=COMMON +// RUN: %clang_cc1 -triple=s390x-linux-gnu %s -emit-llvm -O1 -o - | FileCheck %s --check-prefix=COMMON + +// A struct large enough to land on the indirect-arg path on RV64 (>2*XLEN=16 +// bytes), AArch64 (>16 bytes), LoongArch64, SystemZ. +struct Big { + unsigned long long a, b, c, d; +}; + +// Plain forward: caller(B) musttails callee(B). The fix should emit no +// byval-temp alloca; the call should forward the incoming parameter %a. +struct Big C1(struct Big a); +struct Big P1(struct Big a) { + __attribute__((musttail)) return C1(a); +} +// COMMON-LABEL: define {{.*}} @P1( +// COMMON-NOT: alloca %struct.Big +// COMMON: musttail call {{.*}} @C1({{.*}} %a + +// Two indirect args, same forwarding: each forwards its own incoming param. +struct Big C2(struct Big a, struct Big b); +struct Big P2(struct Big a, struct Big b) { + __attribute__((musttail)) return C2(a, b); +} +// COMMON-LABEL: define {{.*}} @P2( +// COMMON-NOT: alloca %struct.Big +// COMMON: musttail call {{.*}} @C2({{.*}} %a, {{.*}} %b + +// Swapped args: caller(a, b) musttails callee(b, a). Each forwarded slot +// must resolve to the correct incoming Argument, not by position. +struct Big C3(struct Big x, struct Big y); +struct Big P3(struct Big a, struct Big b) { + __attribute__((musttail)) return C3(b, a); +} +// COMMON-LABEL: define {{.*}} @P3( +// COMMON-NOT: alloca %struct.Big +// COMMON: musttail call {{.*}} @C3({{.*}} %b, {{.*}} %a + +// Mixed direct + indirect: only the indirect arg is affected by the fix. +struct Big C4(int n, struct Big a); +struct Big P4(int n, struct Big a) { + __attribute__((musttail)) return C4(n, a); +} +// COMMON-LABEL: define {{.*}} @P4( +// COMMON-NOT: alloca %struct.Big +// COMMON: musttail call {{.*}} @C4({{.*}} %n, {{.*}} %a + +// Negative: local source. Caller takes Big a, but musttails with a LOCAL +// Big initialized in caller's frame. The byval-temp must remain because the +// source lives in caller's frame and would dangle if forwarded. The fix +// must NOT engage in this case. +struct Big C5(struct Big a); +struct Big P5(struct Big a) { + struct Big local = {1, 2, 3, 4}; + __attribute__((musttail)) return C5(local); +} +// COMMON-LABEL: define {{.*}} @P5( +// COMMON: alloca +// COMMON: musttail call {{.*}} @C5( + +// Negative: computed value (caller modifies the parameter then musttails). +// The IR will use %a directly (Clang lowers writes through the incoming +// pointer for Indirect params) so the fix does engage on the formal param, +// but a fresh alloca is not created either way -- existing behavior. +struct Big C6(struct Big a); +struct Big P6(struct Big a) { + a.a += 1; + __attribute__((musttail)) return C6(a); +} +// COMMON-LABEL: define {{.*}} @P6( +// COMMON-NOT: alloca %struct.Big +// COMMON: musttail call {{.*}} @C6({{.*}} %a + +// Non-musttail tail call: the fix must NOT engage. Existing path emits +// the byval-temp as before. +struct Big C7(struct Big a); +struct Big P7(struct Big a) { + return C7(a); +} +// COMMON-LABEL: define {{.*}} @P7( +// COMMON-NOT: musttail >From a1c3fd3e175cd6c26c3868460c03b931382099f9 Mon Sep 17 00:00:00 2001 From: Xavier Roche <[email protected]> Date: Sat, 23 May 2026 16:18:46 +0200 Subject: [PATCH 2/6] [Clang][NFC] Trim comments on musttail Indirect forwarding helper Reduce comment volume on getForwardableIncomingMustTailArg and the call site to match the SRet precedent (a96c14eeb8fc). Same code, shorter comments. Assisted-by: Claude (Anthropic) Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> --- clang/lib/CodeGen/CGCall.cpp | 56 +++++++++--------------------------- 1 file changed, 14 insertions(+), 42 deletions(-) diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp index 89aa89bfb26a4..b130df61c18b8 100644 --- a/clang/lib/CodeGen/CGCall.cpp +++ b/clang/lib/CodeGen/CGCall.cpp @@ -5448,33 +5448,14 @@ static unsigned getMaxVectorWidth(const llvm::Type *Ty) { return MaxVectorWidth; } -/// For a musttail call argument lowered as ABIArgInfo::Indirect, returns the -/// incoming llvm::Argument of the current function when the call argument's -/// source is a forwarded incoming Indirect parameter with a matching ABI -/// shape. Returns nullptr to fall through to the normal byval-temp path. -/// -/// Forwarding is safe under musttail's prototype-match invariant: the -/// incoming pointer points into the caller's caller's frame and stays valid -/// across the tail call, whereas a local alloca would dangle. This mirrors -/// the SRet forwarding in the return path (see commit a96c14eeb8fc, -/// "Always forward sret parameters to musttail calls"). -/// -/// Guards: -/// - The source LValue must be the IR-level Argument of CurFn (peek through -/// one AddrSpaceCastInst for non-default alloca address spaces; do NOT -/// unwrap loads, since a load through a local alloca means the source -/// IS a local). -/// - The incoming parameter must be passed indirectly with byval-ness -/// matching the call slot (Verifier V7). -/// - The Argument must not already have been forwarded by a sibling call -/// argument in this same call (noalias deduplication). +/// Returns the incoming llvm::Argument of the current function if this +/// musttail call argument forwards an incoming Indirect parameter with a +/// matching ABI shape; nullptr to fall through to the byval-temp path. +/// Argument-side analog of a96c14eeb8fc (SRet musttail forwarding). static llvm::Argument *getForwardableIncomingMustTailArg( CodeGenFunction &CGF, const CallArg &CallArgument, const ABIArgInfo &CallSlotInfo, llvm::SmallPtrSetImpl<llvm::Argument *> &AlreadyForwarded) { - // The call argument can be either an LValue (DeclRefExpr to a parameter) - // or an RValue aggregate (typical for struct args lowered by CGCall). Both - // expose the underlying address; we just need the IR-level pointer. Address SrcAddr = Address::invalid(); if (CallArgument.hasLValue()) SrcAddr = CallArgument.getKnownLValue().getAddress(); @@ -5484,9 +5465,9 @@ static llvm::Argument *getForwardableIncomingMustTailArg( return nullptr; llvm::Value *SrcPtr = SrcAddr.emitRawPointer(CGF); - // Peek through one AddrSpaceCastInst. EmitParmDecl wraps incoming Indirect - // parameters in addrspacecast on targets whose alloca address space differs - // from the parameter's pointer address space (NVPTX / AMDGPU / SPIR). + // Peek through one AddrSpaceCastInst (NVPTX / AMDGPU / SPIR wrap incoming + // Indirect params via EmitParmDecl). Do not unwrap loads: a load through + // a local alloca means the source is a local. if (auto *ASC = llvm::dyn_cast<llvm::AddrSpaceCastInst>(SrcPtr)) SrcPtr = ASC->getOperand(0); @@ -5494,16 +5475,11 @@ static llvm::Argument *getForwardableIncomingMustTailArg( if (!IncomingArg || IncomingArg->getParent() != CGF.CurFn) return nullptr; - // byval-ness must match between the incoming parameter and the call slot. - // The Verifier rejects musttail across an ABI-attribute mismatch (V7), so - // producing IR with a mismatch is a verification failure. Falling through - // to byval-temp is the safe behavior. + // Verifier V7 requires matching ABI attributes across musttail. if (IncomingArg->hasByValAttr() != CallSlotInfo.getIndirectByVal()) return nullptr; - // noalias deduplication: a noalias incoming parameter must not be - // forwarded to two slots in the same call. Pre-fix, each slot got its - // own byval-temp; we must not regress that aliasing guarantee. + // Do not forward the same noalias Argument to two slots in one call. if (IncomingArg->hasNoAliasAttr() && !AlreadyForwarded.insert(IncomingArg).second) return nullptr; @@ -5634,10 +5610,8 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo, // markers that need to be ended right after the call. SmallVector<CallLifetimeEnd, 2> CallLifetimeEndAfterCall; - // For musttail calls forwarding Indirect parameters: tracks incoming - // Arguments already forwarded to a slot in this call, so a noalias - // incoming Argument is not forwarded to two slots (see - // getForwardableIncomingMustTailArg). + // Tracks incoming Arguments already forwarded by a musttail Indirect arg, + // for noalias deduplication in getForwardableIncomingMustTailArg. llvm::SmallPtrSet<llvm::Argument *, 4> ForwardedMustTailArgs; // Translate all of the arguments as necessary to match the IR lowering. @@ -5713,11 +5687,9 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo, case ABIArgInfo::IndirectAliased: { assert(NumIRArgs == 1); - // For musttail calls, forward an incoming Indirect parameter directly - // instead of creating a byval-temp. A local alloca would be deallocated - // by the tail call before the callee dereferences the pointer. The - // incoming pointer points into the caller's caller's frame, which - // remains valid. Mirrors the SRet forwarding above (a96c14eeb8fc). + // For musttail, forward an incoming Indirect parameter directly. A + // local alloca would dangle after the tail call. Mirrors the SRet + // forwarding above (a96c14eeb8fc). if (IsMustTail) { if (llvm::Argument *FwdArg = getForwardableIncomingMustTailArg( *this, *I, ArgInfo, ForwardedMustTailArgs)) { >From 11cdbbffa6ae25864aafafedc745cac47c8872ab Mon Sep 17 00:00:00 2001 From: Xavier Roche <[email protected]> Date: Sat, 23 May 2026 18:16:50 +0200 Subject: [PATCH 3/6] [Clang] Address review on musttail Indirect forwarding - Restrict the helper to ABIArgInfo::Indirect (not IndirectAliased). CallSlotInfo.getIndirectByVal() asserts on IndirectAliased; falling through to the byval-temp path is the safe behavior for that case. - Tighten CHECK patterns: anchor the forwarded operand by IR name (e.g. `@C1({{.*}} %a)`) and exclude `alloca [32 x i8]` in addition to `alloca %struct.Big` so a future ABI change that picks a different temp type cannot mask a regression. - Add P5 cross-BB musttail (musttail behind a branch) and P7 same- Argument-to-two-slots (pins the noalias dedup behavior under the Linux C ABI where incoming Indirect params are not noalias). - Reword P6 (caller modifies the parameter then musttails): this is a positive case, not a negative one. The fix correctly engages and eliminates the byval-temp. No functional change to the fix shape; reviewer-flagged hardening. Assisted-by: Claude (Anthropic) Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> --- clang/lib/CodeGen/CGCall.cpp | 6 +- clang/test/CodeGen/musttail-indirect-arg.c | 92 ++++++++++++++-------- 2 files changed, 64 insertions(+), 34 deletions(-) diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp index b130df61c18b8..ad402c459b892 100644 --- a/clang/lib/CodeGen/CGCall.cpp +++ b/clang/lib/CodeGen/CGCall.cpp @@ -5689,8 +5689,10 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo, // For musttail, forward an incoming Indirect parameter directly. A // local alloca would dangle after the tail call. Mirrors the SRet - // forwarding above (a96c14eeb8fc). - if (IsMustTail) { + // forwarding above (a96c14eeb8fc). Limited to Indirect (not + // IndirectAliased) so the byval-ness check in the helper does not + // assert on getIndirectByVal(). + if (IsMustTail && ArgInfo.isIndirect()) { if (llvm::Argument *FwdArg = getForwardableIncomingMustTailArg( *this, *I, ArgInfo, ForwardedMustTailArgs)) { llvm::Value *Val = FwdArg; diff --git a/clang/test/CodeGen/musttail-indirect-arg.c b/clang/test/CodeGen/musttail-indirect-arg.c index 70ef193493e27..0e7493f9e38b7 100644 --- a/clang/test/CodeGen/musttail-indirect-arg.c +++ b/clang/test/CodeGen/musttail-indirect-arg.c @@ -17,14 +17,16 @@ struct Big { }; // Plain forward: caller(B) musttails callee(B). The fix should emit no -// byval-temp alloca; the call should forward the incoming parameter %a. +// alloca for the forwarded arg; the call should forward the incoming +// parameter %a. struct Big C1(struct Big a); struct Big P1(struct Big a) { __attribute__((musttail)) return C1(a); } // COMMON-LABEL: define {{.*}} @P1( -// COMMON-NOT: alloca %struct.Big -// COMMON: musttail call {{.*}} @C1({{.*}} %a +// COMMON-NOT: = alloca {{.*}}struct.Big +// COMMON-NOT: = alloca [32 x i8] +// COMMON: musttail call {{.*}} @C1({{.*}} %a) // Two indirect args, same forwarding: each forwards its own incoming param. struct Big C2(struct Big a, struct Big b); @@ -32,8 +34,8 @@ struct Big P2(struct Big a, struct Big b) { __attribute__((musttail)) return C2(a, b); } // COMMON-LABEL: define {{.*}} @P2( -// COMMON-NOT: alloca %struct.Big -// COMMON: musttail call {{.*}} @C2({{.*}} %a, {{.*}} %b +// COMMON-NOT: = alloca {{.*}}struct.Big +// COMMON: musttail call {{.*}} @C2({{.*}} %a, {{.*}} %b) // Swapped args: caller(a, b) musttails callee(b, a). Each forwarded slot // must resolve to the correct incoming Argument, not by position. @@ -42,8 +44,8 @@ struct Big P3(struct Big a, struct Big b) { __attribute__((musttail)) return C3(b, a); } // COMMON-LABEL: define {{.*}} @P3( -// COMMON-NOT: alloca %struct.Big -// COMMON: musttail call {{.*}} @C3({{.*}} %b, {{.*}} %a +// COMMON-NOT: = alloca {{.*}}struct.Big +// COMMON: musttail call {{.*}} @C3({{.*}} %b, {{.*}} %a) // Mixed direct + indirect: only the indirect arg is affected by the fix. struct Big C4(int n, struct Big a); @@ -51,40 +53,66 @@ struct Big P4(int n, struct Big a) { __attribute__((musttail)) return C4(n, a); } // COMMON-LABEL: define {{.*}} @P4( -// COMMON-NOT: alloca %struct.Big -// COMMON: musttail call {{.*}} @C4({{.*}} %n, {{.*}} %a +// COMMON-NOT: = alloca {{.*}}struct.Big +// COMMON: musttail call {{.*}} @C4({{.*}} %n, {{.*}} %a) -// Negative: local source. Caller takes Big a, but musttails with a LOCAL -// Big initialized in caller's frame. The byval-temp must remain because the -// source lives in caller's frame and would dangle if forwarded. The fix -// must NOT engage in this case. +// Caller modifies the parameter before the musttail. Clang lowers the +// write through the incoming pointer, and the fix forwards the same +// pointer to the callee. No byval-temp. struct Big C5(struct Big a); struct Big P5(struct Big a) { - struct Big local = {1, 2, 3, 4}; - __attribute__((musttail)) return C5(local); + a.a += 1; + __attribute__((musttail)) return C5(a); } // COMMON-LABEL: define {{.*}} @P5( -// COMMON: alloca -// COMMON: musttail call {{.*}} @C5( +// COMMON-NOT: = alloca {{.*}}struct.Big +// COMMON: musttail call {{.*}} @C5({{.*}} %a) -// Negative: computed value (caller modifies the parameter then musttails). -// The IR will use %a directly (Clang lowers writes through the incoming -// pointer for Indirect params) so the fix does engage on the formal param, -// but a fresh alloca is not created either way -- existing behavior. -struct Big C6(struct Big a); -struct Big P6(struct Big a) { - a.a += 1; - __attribute__((musttail)) return C6(a); +// musttail behind a branch: the forwarded pointer must remain live across +// the basic block transition. Tests that the helper does not assume the +// musttail is in the entry block. +struct Big C6(struct Big a, int cond); +struct Big P6(struct Big a, int cond) { + if (cond) + __attribute__((musttail)) return C6(a, cond); + return a; } // COMMON-LABEL: define {{.*}} @P6( -// COMMON-NOT: alloca %struct.Big -// COMMON: musttail call {{.*}} @C6({{.*}} %a +// COMMON-NOT: = alloca {{.*}}struct.Big +// COMMON: musttail call {{.*}} @C6({{.*}} %a, -// Non-musttail tail call: the fix must NOT engage. Existing path emits -// the byval-temp as before. -struct Big C7(struct Big a); -struct Big P7(struct Big a) { - return C7(a); +// Same Argument forwarded to two slots: the helper engages for both. The +// noalias deduplication, if it ever fired, would force the second slot +// back to a byval-temp; but incoming Indirect params under the Linux C +// ABI are not noalias, so both slots forward %a directly. This pins the +// behavior so a future change introducing noalias on Indirect params +// would surface here. (musttail requires matching prototypes, so caller +// and callee both take two Big args.) +struct Big C7(struct Big x, struct Big y); +struct Big P7(struct Big a, struct Big b) { + __attribute__((musttail)) return C7(a, a); } // COMMON-LABEL: define {{.*}} @P7( +// COMMON-NOT: = alloca {{.*}}struct.Big +// COMMON: musttail call {{.*}} @C7({{.*}} %a, {{.*}} %a) + +// Negative: local source. Caller takes Big a, but musttails with a LOCAL +// Big initialized in caller's frame. The byval-temp must remain because the +// source lives in caller's frame and would dangle if forwarded. +struct Big C8(struct Big a); +struct Big P8(struct Big a) { + struct Big local = {1, 2, 3, 4}; + __attribute__((musttail)) return C8(local); +} +// COMMON-LABEL: define {{.*}} @P8( +// COMMON: = alloca +// COMMON: musttail call {{.*}} @C8( + +// Non-musttail tail call: the fix must NOT engage. Existing path emits +// the byval-temp as before, no musttail in the IR. +struct Big C9(struct Big a); +struct Big P9(struct Big a) { + return C9(a); +} +// COMMON-LABEL: define {{.*}} @P9( // COMMON-NOT: musttail >From 43590837d6a8bf2d8120be835dc0cfd2817d9822 Mon Sep 17 00:00:00 2001 From: Xavier Roche <[email protected]> Date: Sat, 23 May 2026 21:20:58 +0200 Subject: [PATCH 4/6] [Clang] Extend musttail Indirect forwarding to C++ trivial-copy args For C++ struct-by-value arguments, the call argument is a CXXConstructExpr invoking the implicit copy constructor, not the LValueToRValue cast that EmitCallArg uses for C. The trivial copy materializes an agg.tmp before EmitCall runs, so the helper in EmitCall (getForwardableIncomingMustTailArg) sees a local alloca and falls through. Detect the trivial-copy CXXConstructExpr whose source is a ParmVarDecl of the current function inside a musttail call (signaled by CodeGenFunction::MustTailCall being non-null) and take the addUncopiedAggregate path so the LValue of the original parameter reaches EmitCall. The forwarding helper then forwards the incoming Argument, no agg.tmp. Limited to trivially-copyable types so we never elide observable copy constructors or skip destruction. Non-trivial copy ctors / dtors still take the existing materialize-then-copy path. Test: clang/test/CodeGen/musttail-indirect-arg.cpp (companion to the C test) covers plain forward, two-arg, swapped, and verifies the trivial-copy elision does NOT engage for non-trivial copy ctors or non-musttail tail calls. clang/test/CodeGen + CodeGenCXX (7307 tests) clean. Runtime cross-arch sweep on a C++ minimal repro passes on x86_64, riscv64, aarch64, arm, loongarch64, s390x; the same repro fails on RV64/AArch64 without this commit, confirming the new path catches the C++ bug shape. Fixes the C++ scope hole called out in the prior commit (11cdbbffa6ae). Assisted-by: Claude (Anthropic) Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> --- clang/lib/CodeGen/CGCall.cpp | 24 +++++++ clang/test/CodeGen/musttail-indirect-arg.cpp | 73 ++++++++++++++++++++ 2 files changed, 97 insertions(+) create mode 100644 clang/test/CodeGen/musttail-indirect-arg.cpp diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp index ad402c459b892..c34c44571b33d 100644 --- a/clang/lib/CodeGen/CGCall.cpp +++ b/clang/lib/CodeGen/CGCall.cpp @@ -5139,6 +5139,30 @@ void CodeGenFunction::EmitCallArg(CallArgList &args, const Expr *E, return; } + // For musttail calls in C++, skip the agg.tmp copy when the source is a + // trivially-copyable parameter of the current function. The copy would + // live in our frame, which the tail call deallocates before the callee + // dereferences it. The companion fix in EmitCall (getForwardableIncoming + // MustTailArg) forwards the incoming Indirect Argument; for that to fire + // here we must hand it the original parameter LValue, not a fresh temp. + if (HasAggregateEvalKind && MustTailCall && type->isRecordType() && + type.isTriviallyCopyableType(getContext())) { + if (const auto *CCE = dyn_cast<CXXConstructExpr>(E)) { + if (CCE->getConstructor()->isCopyOrMoveConstructor() && + CCE->getConstructor()->isTrivial() && CCE->getNumArgs() == 1) { + const Expr *Source = CCE->getArg(0)->IgnoreParenImpCasts(); + if (const auto *DRE = dyn_cast<DeclRefExpr>(Source)) + if (const auto *PVD = dyn_cast<ParmVarDecl>(DRE->getDecl())) + if (PVD->getDeclContext() == dyn_cast<DeclContext>(CurCodeDecl)) { + LValue L = EmitLValue(DRE); + assert(L.isSimple()); + args.addUncopiedAggregate(L, type); + return; + } + } + } + } + args.add(EmitAnyExprToTemp(E), type); } diff --git a/clang/test/CodeGen/musttail-indirect-arg.cpp b/clang/test/CodeGen/musttail-indirect-arg.cpp new file mode 100644 index 0000000000000..a161f979b5a4c --- /dev/null +++ b/clang/test/CodeGen/musttail-indirect-arg.cpp @@ -0,0 +1,73 @@ +// Test that Clang forwards incoming Indirect parameters across musttail calls +// for C++ struct-by-value arguments with trivially-copyable types. Companion to +// musttail-indirect-arg.c (the C side of the same fix) and musttail-sret.cpp +// (the SRet precedent in a96c14eeb8fc). +// +// C++ goes through a different EmitCallArg path than C: the call argument is a +// CXXConstructExpr invoking the implicit copy constructor, which would +// otherwise materialize an agg.tmp before EmitCall. For musttail with a +// trivially-copyable parameter forwarded directly, the copy is elided so the +// helper in EmitCall can forward the incoming llvm::Argument. + +// RUN: %clang_cc1 -triple=riscv64-linux-gnu %s -emit-llvm -O1 -o - | FileCheck %s --check-prefix=COMMON +// RUN: %clang_cc1 -triple=aarch64-linux-gnu %s -emit-llvm -O1 -o - | FileCheck %s --check-prefix=COMMON +// RUN: %clang_cc1 -triple=loongarch64-linux-gnu %s -emit-llvm -O1 -o - | FileCheck %s --check-prefix=COMMON +// RUN: %clang_cc1 -triple=s390x-linux-gnu %s -emit-llvm -O1 -o - | FileCheck %s --check-prefix=COMMON + +// A trivially-copyable struct large enough to land on the indirect-arg path +// on RV64, AArch64, LoongArch64, SystemZ. +struct Big { + unsigned long long a, b, c, d; +}; + +// Plain forward: caller(B) musttails callee(B). No agg.tmp copy, no +// byval-temp; the incoming parameter %a is forwarded directly. +struct Big C1(struct Big a); +struct Big P1(struct Big a) { + [[clang::musttail]] return C1(a); +} +// COMMON-LABEL: define {{.*}} @_Z2P13Big( +// COMMON-NOT: = alloca {{.*}}struct.Big +// COMMON: musttail call {{.*}} @_Z2C13Big({{.*}} %a) + +// Two args, same forwarding. +struct Big C2(struct Big a, struct Big b); +struct Big P2(struct Big a, struct Big b) { + [[clang::musttail]] return C2(a, b); +} +// COMMON-LABEL: define {{.*}} @_Z2P23BigS_( +// COMMON-NOT: = alloca {{.*}}struct.Big +// COMMON: musttail call {{.*}} @_Z2C23BigS_({{.*}} %a, {{.*}} %b) + +// Swapped args. +struct Big C3(struct Big x, struct Big y); +struct Big P3(struct Big a, struct Big b) { + [[clang::musttail]] return C3(b, a); +} +// COMMON-LABEL: define {{.*}} @_Z2P33BigS_( +// COMMON-NOT: = alloca {{.*}}struct.Big +// COMMON: musttail call {{.*}} @_Z2C33BigS_({{.*}} %b, {{.*}} %a) + +// Non-trivial copy constructor: the trivial-copy elision must NOT engage. +// Existing path materializes the agg.tmp (the user-defined copy ctor has +// observable behavior). +struct NonTrivial { + unsigned long long parts[4]; + NonTrivial(const NonTrivial &); +}; +NonTrivial C4(NonTrivial a); +NonTrivial P4(NonTrivial a) { + [[clang::musttail]] return C4(a); +} +// COMMON-LABEL: define {{.*}} @_Z2P410NonTrivial( +// The user-defined copy ctor IS called (the agg.tmp pattern still happens): +// COMMON: call {{.*}} @_ZN10NonTrivialC1ERKS_ + +// Non-musttail tail call: trivial-copy elision must NOT engage; the regular +// agg.tmp copy is still emitted. +struct Big C5(struct Big a); +struct Big P5(struct Big a) { + return C5(a); +} +// COMMON-LABEL: define {{.*}} @_Z2P53Big( +// COMMON-NOT: musttail >From 9b1aa8b1228237096222c71ecaa4ea73c5b5b96c Mon Sep 17 00:00:00 2001 From: Xavier Roche <[email protected]> Date: Sat, 23 May 2026 21:37:39 +0200 Subject: [PATCH 5/6] [Clang][test] Mirror C test cases in musttail-indirect-arg.cpp Add the C++ analogs of P5/P6/P7/P8 from the C test that the C++ test was missing per code review: - P5: modify-then-musttail (trivial-copy elision still engages). - P6: cross-BB musttail (elision works across basic blocks). - P7: same Argument forwarded to two slots (pins noalias dedup behavior; incoming Indirect params are not noalias under Linux C++). - P8: source is a non-parameter local (elision must NOT engage, byval-temp pattern remains). Existing non-musttail test renumbered P5 -> P9. Closes the test-coverage gap flagged by the second review on commit 43590837d6a8. Assisted-by: Claude (Anthropic) Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]> --- clang/test/CodeGen/musttail-indirect-arg.cpp | 52 ++++++++++++++++++-- 1 file changed, 49 insertions(+), 3 deletions(-) diff --git a/clang/test/CodeGen/musttail-indirect-arg.cpp b/clang/test/CodeGen/musttail-indirect-arg.cpp index a161f979b5a4c..4c3203ca12cf2 100644 --- a/clang/test/CodeGen/musttail-indirect-arg.cpp +++ b/clang/test/CodeGen/musttail-indirect-arg.cpp @@ -63,11 +63,57 @@ NonTrivial P4(NonTrivial a) { // The user-defined copy ctor IS called (the agg.tmp pattern still happens): // COMMON: call {{.*}} @_ZN10NonTrivialC1ERKS_ -// Non-musttail tail call: trivial-copy elision must NOT engage; the regular -// agg.tmp copy is still emitted. +// Caller modifies the parameter before the musttail. The trivial-copy +// elision still engages because the source LValue is still the parameter; +// any mutation flowed through the incoming pointer is observed by the +// forwarded call. struct Big C5(struct Big a); struct Big P5(struct Big a) { - return C5(a); + a.a += 1; + [[clang::musttail]] return C5(a); } // COMMON-LABEL: define {{.*}} @_Z2P53Big( +// COMMON-NOT: = alloca {{.*}}struct.Big +// COMMON: musttail call {{.*}} @_Z2C53Big({{.*}} %a) + +// musttail behind a branch: trivial-copy elision must work across BBs. +struct Big C6(struct Big a, int cond); +struct Big P6(struct Big a, int cond) { + if (cond) + [[clang::musttail]] return C6(a, cond); + return a; +} +// COMMON-LABEL: define {{.*}} @_Z2P63Bigi( +// COMMON-NOT: = alloca {{.*}}struct.Big +// COMMON: musttail call {{.*}} @_Z2C63Bigi({{.*}} %a, + +// Same Argument forwarded to two slots: both engage. Incoming Indirect +// params are not noalias under the Linux C++ ABI so the dedup in the +// helper does not fire and both slots forward %a directly. +struct Big C7(struct Big x, struct Big y); +struct Big P7(struct Big a, struct Big b) { + [[clang::musttail]] return C7(a, a); +} +// COMMON-LABEL: define {{.*}} @_Z2P73BigS_( +// COMMON-NOT: = alloca {{.*}}struct.Big +// COMMON: musttail call {{.*}} @_Z2C73BigS_({{.*}} %a, {{.*}} %a) + +// Negative: source is a local, not a parameter. The trivial-copy elision +// must NOT engage; the byval-temp pattern remains. +struct Big C8(struct Big a); +struct Big P8(struct Big a) { + struct Big local = {1, 2, 3, 4}; + [[clang::musttail]] return C8(local); +} +// COMMON-LABEL: define {{.*}} @_Z2P83Big( +// COMMON: = alloca +// COMMON: musttail call {{.*}} @_Z2C83Big( + +// Non-musttail tail call: trivial-copy elision must NOT engage; the regular +// agg.tmp copy is still emitted. +struct Big C9(struct Big a); +struct Big P9(struct Big a) { + return C9(a); +} +// COMMON-LABEL: define {{.*}} @_Z2P93Big( // COMMON-NOT: musttail >From 99dfa79d0d3f5424b36bf4ea01460f76dec4f98a Mon Sep 17 00:00:00 2001 From: Xavier Roche <[email protected]> Date: Tue, 26 May 2026 10:11:03 +0200 Subject: [PATCH 6/6] [Clang] Switch musttail Indirect to a two-phase general algorithm Addresses review on commit 9b1aa8b12282 (efriedma-quic): the prior code optimized only the "source is incoming Indirect param" case, and test P7 asserted a miscompile (forwarding the same incoming pointer to two call slots aliases two by-value parameters, which violates the C ABI rule of distinct storage per slot). For each musttail Indirect slot i, the i-th incoming Indirect parameter's pointer (CurFn->arg_begin()+FirstIRArg, distinct and well-defined by musttail prototype-match) is the destination. If the source LValue already equals that pointer, pass it. Otherwise the value is copied into the incoming-param storage and that pointer is passed. A single-phase implementation is unsound for permutations like C(b, a) where slot-by-slot in-place writes clobber sources later slots need. Two phases: - Phase 1, in the per-arg loop: when source != destination, capture the source value into a scratch alloca in this frame. - Phase 2, after the per-arg loop: write each scratch into its matching incoming-param destination. Scratches live in this frame and die at tail-call teardown; they are consumed before the tail jump so this is safe. Destinations live in the caller's caller's frame and survive the tail call. EmitCallArg broadens the C++ trivial-copy gate from "ParmVarDecl of this function" to "param or local of this function" so the source LValue reaches EmitCall in both cases. Tests rewritten in spec-driven form: forward, distinct, swap, modify-then-forward, cross-BB, same-arg-twice/thrice, local source, mixed direct+indirect, many-args with stack spill, over-aligned struct, C++ member function, and non-trivial copy ctor as a negative. Local validation: check-clang 50997/52167. Runtime probe under QEMU on aarch64, riscv64, arm-linux-gnueabihf, and s390x passes; the same probe fails on aarch64 and riscv64 with pre-patch clang, exposing the alias and swap miscompiles. Assisted-by: Claude (Anthropic) Co-Authored-By: Claude Opus 4.6 <[email protected]> --- clang/lib/CodeGen/CGCall.cpp | 127 +++++++++-------- clang/test/CodeGen/musttail-indirect-arg.c | 135 +++++++++++-------- clang/test/CodeGen/musttail-indirect-arg.cpp | 118 +++++++++------- 3 files changed, 223 insertions(+), 157 deletions(-) diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp index c34c44571b33d..04774cc967071 100644 --- a/clang/lib/CodeGen/CGCall.cpp +++ b/clang/lib/CodeGen/CGCall.cpp @@ -5139,26 +5139,30 @@ void CodeGenFunction::EmitCallArg(CallArgList &args, const Expr *E, return; } - // For musttail calls in C++, skip the agg.tmp copy when the source is a - // trivially-copyable parameter of the current function. The copy would - // live in our frame, which the tail call deallocates before the callee - // dereferences it. The companion fix in EmitCall (getForwardableIncoming - // MustTailArg) forwards the incoming Indirect Argument; for that to fire - // here we must hand it the original parameter LValue, not a fresh temp. + // Under musttail, hand a trivially-copyable record source's LValue to + // EmitCall rather than materializing an agg.tmp. EmitCall's Indirect path + // routes it via the matching incoming parameter, which survives the tail + // call. Limited to params and locals: globals and captures don't have the + // dangle issue and the existing path may be more efficient for them. if (HasAggregateEvalKind && MustTailCall && type->isRecordType() && type.isTriviallyCopyableType(getContext())) { if (const auto *CCE = dyn_cast<CXXConstructExpr>(E)) { if (CCE->getConstructor()->isCopyOrMoveConstructor() && CCE->getConstructor()->isTrivial() && CCE->getNumArgs() == 1) { const Expr *Source = CCE->getArg(0)->IgnoreParenImpCasts(); - if (const auto *DRE = dyn_cast<DeclRefExpr>(Source)) - if (const auto *PVD = dyn_cast<ParmVarDecl>(DRE->getDecl())) - if (PVD->getDeclContext() == dyn_cast<DeclContext>(CurCodeDecl)) { + if (const auto *DRE = dyn_cast<DeclRefExpr>(Source)) { + if (const auto *VD = dyn_cast<VarDecl>(DRE->getDecl())) { + if (VD->hasLocalStorage() || + (isa<ParmVarDecl>(VD) && + VD->getDeclContext() == + dyn_cast<DeclContext>(CurCodeDecl))) { LValue L = EmitLValue(DRE); assert(L.isSimple()); args.addUncopiedAggregate(L, type); return; } + } + } } } } @@ -5472,43 +5476,15 @@ static unsigned getMaxVectorWidth(const llvm::Type *Ty) { return MaxVectorWidth; } -/// Returns the incoming llvm::Argument of the current function if this -/// musttail call argument forwards an incoming Indirect parameter with a -/// matching ABI shape; nullptr to fall through to the byval-temp path. -/// Argument-side analog of a96c14eeb8fc (SRet musttail forwarding). -static llvm::Argument *getForwardableIncomingMustTailArg( - CodeGenFunction &CGF, const CallArg &CallArgument, - const ABIArgInfo &CallSlotInfo, - llvm::SmallPtrSetImpl<llvm::Argument *> &AlreadyForwarded) { - Address SrcAddr = Address::invalid(); - if (CallArgument.hasLValue()) - SrcAddr = CallArgument.getKnownLValue().getAddress(); - else if (CallArgument.getKnownRValue().isAggregate()) - SrcAddr = CallArgument.getKnownRValue().getAggregateAddress(); - else - return nullptr; - llvm::Value *SrcPtr = SrcAddr.emitRawPointer(CGF); - - // Peek through one AddrSpaceCastInst (NVPTX / AMDGPU / SPIR wrap incoming - // Indirect params via EmitParmDecl). Do not unwrap loads: a load through - // a local alloca means the source is a local. +/// Peel one AddrSpaceCastInst from \p SrcPtr. EmitParmDecl wraps incoming +/// Indirect params via address-space cast on NVPTX/AMDGPU/SPIR, so peeling +/// exposes the underlying llvm::Argument when the source IS a forwarded +/// incoming parameter. Loads are NOT unwrapped: a load through a local +/// alloca means the source is a local. +static llvm::Value *peelAddrSpaceCast(llvm::Value *SrcPtr) { if (auto *ASC = llvm::dyn_cast<llvm::AddrSpaceCastInst>(SrcPtr)) - SrcPtr = ASC->getOperand(0); - - auto *IncomingArg = llvm::dyn_cast<llvm::Argument>(SrcPtr); - if (!IncomingArg || IncomingArg->getParent() != CGF.CurFn) - return nullptr; - - // Verifier V7 requires matching ABI attributes across musttail. - if (IncomingArg->hasByValAttr() != CallSlotInfo.getIndirectByVal()) - return nullptr; - - // Do not forward the same noalias Argument to two slots in one call. - if (IncomingArg->hasNoAliasAttr() && - !AlreadyForwarded.insert(IncomingArg).second) - return nullptr; - - return IncomingArg; + return ASC->getOperand(0); + return SrcPtr; } RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo, @@ -5634,9 +5610,16 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo, // markers that need to be ended right after the call. SmallVector<CallLifetimeEnd, 2> CallLifetimeEndAfterCall; - // Tracks incoming Arguments already forwarded by a musttail Indirect arg, - // for noalias deduplication in getForwardableIncomingMustTailArg. - llvm::SmallPtrSet<llvm::Argument *, 4> ForwardedMustTailArgs; + // Deferred Phase-2 writes for musttail Indirect args. Splitting reads + // (in the per-arg loop) from writes (after the loop) lets permutations + // like C(b, a) land correctly: all sources are captured into scratches + // before any incoming-param destination is overwritten. + struct MustTailIndirectCopy { + LValue Scratch; + LValue Dst; + QualType Ty; + }; + llvm::SmallVector<MustTailIndirectCopy, 4> MustTailIndirectCopies; // Translate all of the arguments as necessary to match the IR lowering. assert(CallInfo.arg_size() == CallArgs.size() && @@ -5711,20 +5694,45 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo, case ABIArgInfo::IndirectAliased: { assert(NumIRArgs == 1); - // For musttail, forward an incoming Indirect parameter directly. A - // local alloca would dangle after the tail call. Mirrors the SRet - // forwarding above (a96c14eeb8fc). Limited to Indirect (not - // IndirectAliased) so the byval-ness check in the helper does not - // assert on getIndirectByVal(). + // Musttail Indirect: route via the matching incoming parameter. + // Prototype-match (Verifier V5/V6/V7) makes CurFn->arg_begin()+ + // FirstIRArg a distinct destination for this slot that lives in the + // caller's caller's frame and survives the tail call. To handle + // permutations safely, the source value is captured into a scratch + // alloca here (Phase 1); the write to the incoming-param destination + // is deferred until after all sources have been read (Phase 2 below). + // IndirectAliased uses the existing fallback (different source-AS). if (IsMustTail && ArgInfo.isIndirect()) { - if (llvm::Argument *FwdArg = getForwardableIncomingMustTailArg( - *this, *I, ArgInfo, ForwardedMustTailArgs)) { - llvm::Value *Val = FwdArg; + llvm::Argument *IncomingArg = CurFn->arg_begin() + FirstIRArg; + llvm::Value *Dst = IncomingArg; + Address SrcAddr = Address::invalid(); + if (I->hasLValue()) + SrcAddr = I->getKnownLValue().getAddress(); + else if (I->getKnownRValue().isAggregate()) + SrcAddr = I->getKnownRValue().getAggregateAddress(); + if (SrcAddr.isValid()) { + llvm::Value *Src = peelAddrSpaceCast(SrcAddr.emitRawPointer(*this)); + if (Src != Dst) { + CharUnits Align = ArgInfo.getIndirectAlign(); + QualType Ty = I->Ty; + llvm::Type *ElemTy = ConvertTypeForMem(Ty); + RawAddress Scratch = + CreateMemTempWithoutCast(Ty, Align, "musttail.copy"); + LValue ScratchLV = MakeAddrLValue(Scratch, Ty); + LValue SrcLV = MakeAddrLValue(SrcAddr, Ty); + EmitAggregateCopy(ScratchLV, SrcLV, Ty, + AggValueSlot::DoesNotOverlap); + LValue DstLV = + MakeAddrLValue(Address(Dst, ElemTy, Align), Ty); + MustTailIndirectCopies.push_back({ScratchLV, DstLV, Ty}); + } + llvm::Value *Val = Dst; if (ArgHasMaybeUndefAttr) Val = Builder.CreateFreeze(Val); IRCallArgs[FirstIRArg] = Val; break; } + // No clean source address (rare): fall through to byval-temp. } if (I->isAggregate()) { @@ -6053,6 +6061,13 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo, } } + // Phase 2 of the musttail Indirect-arg copy: flush each captured scratch + // into its incoming-param destination. Phase 1 has read every source, so + // permutations like C(b, a) land the right value in each slot. + for (const auto &Copy : MustTailIndirectCopies) + EmitAggregateCopy(Copy.Dst, Copy.Scratch, Copy.Ty, + AggValueSlot::DoesNotOverlap); + const CGCallee &ConcreteCallee = Callee.prepareConcreteCallee(*this); llvm::Value *CalleePtr = ConcreteCallee.getFunctionPointer(); diff --git a/clang/test/CodeGen/musttail-indirect-arg.c b/clang/test/CodeGen/musttail-indirect-arg.c index 0e7493f9e38b7..fd82e9cf900c1 100644 --- a/clang/test/CodeGen/musttail-indirect-arg.c +++ b/clang/test/CodeGen/musttail-indirect-arg.c @@ -1,76 +1,60 @@ -// Test that Clang forwards incoming Indirect parameters across musttail calls -// instead of creating a byval-temp alloca that would dangle after the tail call -// deallocates the caller's frame. -// -// Companion to musttail-sret.cpp (commit a96c14eeb8fc): same idea, applied to -// incoming arguments rather than the sret return slot. - // RUN: %clang_cc1 -triple=riscv64-linux-gnu %s -emit-llvm -O1 -o - | FileCheck %s --check-prefix=COMMON // RUN: %clang_cc1 -triple=aarch64-linux-gnu %s -emit-llvm -O1 -o - | FileCheck %s --check-prefix=COMMON // RUN: %clang_cc1 -triple=loongarch64-linux-gnu %s -emit-llvm -O1 -o - | FileCheck %s --check-prefix=COMMON // RUN: %clang_cc1 -triple=s390x-linux-gnu %s -emit-llvm -O1 -o - | FileCheck %s --check-prefix=COMMON -// A struct large enough to land on the indirect-arg path on RV64 (>2*XLEN=16 -// bytes), AArch64 (>16 bytes), LoongArch64, SystemZ. +// Musttail calls with struct-by-value args must route each value through the +// matching incoming Indirect parameter's storage, not a local alloca that +// dangles past the tail-call frame teardown. For each Indirect slot i: if +// the source IS the i-th incoming pointer, pass it; otherwise memcpy the +// source into the i-th incoming pointer and pass that. Each call slot ends +// up with a distinct pointer in the caller's caller's frame. + +// Plain Indirect-ABI struct on the targets above. struct Big { unsigned long long a, b, c, d; }; -// Plain forward: caller(B) musttails callee(B). The fix should emit no -// alloca for the forwarded arg; the call should forward the incoming -// parameter %a. +// P1: simple forward. struct Big C1(struct Big a); struct Big P1(struct Big a) { __attribute__((musttail)) return C1(a); } // COMMON-LABEL: define {{.*}} @P1( // COMMON-NOT: = alloca {{.*}}struct.Big -// COMMON-NOT: = alloca [32 x i8] -// COMMON: musttail call {{.*}} @C1({{.*}} %a) +// COMMON: musttail call {{.*}} @C1({{.*}}, ptr {{.*}} %a) -// Two indirect args, same forwarding: each forwards its own incoming param. +// P2: two distinct incoming sources. struct Big C2(struct Big a, struct Big b); struct Big P2(struct Big a, struct Big b) { __attribute__((musttail)) return C2(a, b); } // COMMON-LABEL: define {{.*}} @P2( // COMMON-NOT: = alloca {{.*}}struct.Big -// COMMON: musttail call {{.*}} @C2({{.*}} %a, {{.*}} %b) +// COMMON-NOT: llvm.memcpy +// COMMON: musttail call {{.*}} @C2({{.*}}, ptr {{.*}} %a, ptr {{.*}} %b) -// Swapped args: caller(a, b) musttails callee(b, a). Each forwarded slot -// must resolve to the correct incoming Argument, not by position. +// P3: swap. Slot 0's dst is %a (positional), so the value of %b is copied +// into %a; symmetrically for slot 1. Two-phase emit captures both sources +// before either destination is overwritten. struct Big C3(struct Big x, struct Big y); struct Big P3(struct Big a, struct Big b) { __attribute__((musttail)) return C3(b, a); } // COMMON-LABEL: define {{.*}} @P3( -// COMMON-NOT: = alloca {{.*}}struct.Big -// COMMON: musttail call {{.*}} @C3({{.*}} %b, {{.*}} %a) - -// Mixed direct + indirect: only the indirect arg is affected by the fix. -struct Big C4(int n, struct Big a); -struct Big P4(int n, struct Big a) { - __attribute__((musttail)) return C4(n, a); -} -// COMMON-LABEL: define {{.*}} @P4( -// COMMON-NOT: = alloca {{.*}}struct.Big -// COMMON: musttail call {{.*}} @C4({{.*}} %n, {{.*}} %a) +// COMMON: musttail call {{.*}} @C3({{.*}}, ptr {{.*}}, ptr {{.*}}) -// Caller modifies the parameter before the musttail. Clang lowers the -// write through the incoming pointer, and the fix forwards the same -// pointer to the callee. No byval-temp. +// P5: caller mutates the parameter before the musttail. The mutation lands +// at the incoming pointer the callee receives. struct Big C5(struct Big a); struct Big P5(struct Big a) { a.a += 1; __attribute__((musttail)) return C5(a); } // COMMON-LABEL: define {{.*}} @P5( -// COMMON-NOT: = alloca {{.*}}struct.Big -// COMMON: musttail call {{.*}} @C5({{.*}} %a) +// COMMON: musttail call {{.*}} @C5({{.*}}, ptr {{.*}} %a) -// musttail behind a branch: the forwarded pointer must remain live across -// the basic block transition. Tests that the helper does not assume the -// musttail is in the entry block. +// P6: musttail in a non-entry block. struct Big C6(struct Big a, int cond); struct Big P6(struct Big a, int cond) { if (cond) @@ -78,41 +62,82 @@ struct Big P6(struct Big a, int cond) { return a; } // COMMON-LABEL: define {{.*}} @P6( -// COMMON-NOT: = alloca {{.*}}struct.Big -// COMMON: musttail call {{.*}} @C6({{.*}} %a, +// COMMON: musttail call {{.*}} @C6({{.*}}, ptr {{.*}} %a, -// Same Argument forwarded to two slots: the helper engages for both. The -// noalias deduplication, if it ever fired, would force the second slot -// back to a byval-temp; but incoming Indirect params under the Linux C -// ABI are not noalias, so both slots forward %a directly. This pins the -// behavior so a future change introducing noalias on Indirect params -// would surface here. (musttail requires matching prototypes, so caller -// and callee both take two Big args.) +// P7: same arg to two slots. C ABI requires distinct storage per by-value +// param, so slot 1 cannot share %a's pointer. Slot 0 forwards %a; slot 1 +// memcpys *%a into the i=1 incoming pointer %b and forwards %b. struct Big C7(struct Big x, struct Big y); struct Big P7(struct Big a, struct Big b) { __attribute__((musttail)) return C7(a, a); } // COMMON-LABEL: define {{.*}} @P7( -// COMMON-NOT: = alloca {{.*}}struct.Big -// COMMON: musttail call {{.*}} @C7({{.*}} %a, {{.*}} %a) +// COMMON: llvm.mem{{(cpy|move)}}{{.*}}(ptr {{.*}} %b, ptr {{.*}} %a, +// COMMON: musttail call {{.*}} @C7({{.*}}, ptr {{.*}} %a, ptr {{.*}} %b) -// Negative: local source. Caller takes Big a, but musttails with a LOCAL -// Big initialized in caller's frame. The byval-temp must remain because the -// source lives in caller's frame and would dangle if forwarded. +// P8: local source. The local lives in our frame; copy it into %a, forward %a. struct Big C8(struct Big a); struct Big P8(struct Big a) { struct Big local = {1, 2, 3, 4}; __attribute__((musttail)) return C8(local); } // COMMON-LABEL: define {{.*}} @P8( -// COMMON: = alloca -// COMMON: musttail call {{.*}} @C8( +// COMMON: llvm.mem{{(cpy|move)}}{{.*}}(ptr {{.*}} %a, ptr {{.*}} +// COMMON: musttail call {{.*}} @C8({{.*}}, ptr {{.*}} %a) -// Non-musttail tail call: the fix must NOT engage. Existing path emits -// the byval-temp as before, no musttail in the IR. +// P9: non-musttail tail call (existing path). struct Big C9(struct Big a); struct Big P9(struct Big a) { return C9(a); } // COMMON-LABEL: define {{.*}} @P9( // COMMON-NOT: musttail + +// P10: mixed direct + indirect. +struct Big C10(int x1, struct Big s1, int x2, struct Big s2); +struct Big P10(int x1, struct Big s1, int x2, struct Big s2) { + __attribute__((musttail)) return C10(x1, s1, x2, s2); +} +// COMMON-LABEL: define {{.*}} @P10( +// COMMON-NOT: = alloca {{.*}}struct.Big +// COMMON: musttail call {{.*}} @C10({{.*}}, i32 {{.*}} %x1, ptr {{.*}} %s1, i32 {{.*}} %x2, ptr {{.*}} %s2) + +// P11: many args, including stack-spilled ones on the target ABIs above. +struct Big C11(struct Big s1, struct Big s2, struct Big s3, struct Big s4, + struct Big s5, struct Big s6, struct Big s7, struct Big s8, + struct Big s9, struct Big s10); +struct Big P11(struct Big a1, struct Big a2, struct Big a3, struct Big a4, + struct Big a5, struct Big a6, struct Big a7, struct Big a8, + struct Big a9, struct Big a10) { + __attribute__((musttail)) return C11(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10); +} +// COMMON-LABEL: define {{.*}} @P11( +// COMMON-NOT: = alloca {{.*}}struct.Big +// COMMON: musttail call {{.*}} @C11( +// COMMON-SAME: ptr {{.*}} %a1, ptr {{.*}} %a2, ptr {{.*}} %a3, ptr {{.*}} %a4 +// COMMON-SAME: ptr {{.*}} %a5, ptr {{.*}} %a6, ptr {{.*}} %a7, ptr {{.*}} %a8 +// COMMON-SAME: ptr {{.*}} %a9, ptr {{.*}} %a10 + +// P12: over-aligned struct. +struct __attribute__((aligned(32))) AlignedBig { + unsigned long long a, b, c, d; +}; +struct AlignedBig C12(struct AlignedBig a); +struct AlignedBig P12(struct AlignedBig a) { + __attribute__((musttail)) return C12(a); +} +// COMMON-LABEL: define {{.*}} @P12( +// COMMON: musttail call {{.*}} @C12({{.*}}, ptr {{.*}} %a) + +// P17: same arg to three slots (generalization of P7). +struct Big C17(struct Big x, struct Big y, struct Big z); +struct Big P17(struct Big a, struct Big b, struct Big c) { + __attribute__((musttail)) return C17(a, a, a); +} +// COMMON-LABEL: define {{.*}} @P17( +// At -O1 the optimizer may fold the per-slot reads, so only require that +// each destination (%b, %c) gets a memcpy/memmove and that the musttail +// call passes three distinct incoming pointers. +// COMMON: llvm.mem{{(cpy|move)}}{{.*}}(ptr {{.*}} %b, +// COMMON: llvm.mem{{(cpy|move)}}{{.*}}(ptr {{.*}} %c, +// COMMON: musttail call {{.*}} @C17({{.*}}, ptr {{.*}} %a, ptr {{.*}} %b, ptr {{.*}} %c) diff --git a/clang/test/CodeGen/musttail-indirect-arg.cpp b/clang/test/CodeGen/musttail-indirect-arg.cpp index 4c3203ca12cf2..3c8d037dd7d3b 100644 --- a/clang/test/CodeGen/musttail-indirect-arg.cpp +++ b/clang/test/CodeGen/musttail-indirect-arg.cpp @@ -1,56 +1,46 @@ -// Test that Clang forwards incoming Indirect parameters across musttail calls -// for C++ struct-by-value arguments with trivially-copyable types. Companion to -// musttail-indirect-arg.c (the C side of the same fix) and musttail-sret.cpp -// (the SRet precedent in a96c14eeb8fc). -// -// C++ goes through a different EmitCallArg path than C: the call argument is a -// CXXConstructExpr invoking the implicit copy constructor, which would -// otherwise materialize an agg.tmp before EmitCall. For musttail with a -// trivially-copyable parameter forwarded directly, the copy is elided so the -// helper in EmitCall can forward the incoming llvm::Argument. - // RUN: %clang_cc1 -triple=riscv64-linux-gnu %s -emit-llvm -O1 -o - | FileCheck %s --check-prefix=COMMON // RUN: %clang_cc1 -triple=aarch64-linux-gnu %s -emit-llvm -O1 -o - | FileCheck %s --check-prefix=COMMON // RUN: %clang_cc1 -triple=loongarch64-linux-gnu %s -emit-llvm -O1 -o - | FileCheck %s --check-prefix=COMMON // RUN: %clang_cc1 -triple=s390x-linux-gnu %s -emit-llvm -O1 -o - | FileCheck %s --check-prefix=COMMON -// A trivially-copyable struct large enough to land on the indirect-arg path -// on RV64, AArch64, LoongArch64, SystemZ. +// C++ side of the musttail Indirect-arg fix. The call argument is typically +// a CXXConstructExpr invoking the trivial copy constructor; EmitCallArg +// detects the trivial-copy-from-DeclRefExpr case under musttail and hands +// the source LValue to EmitCall so the general path engages. Non-trivial +// copy or move constructors keep the existing agg.tmp path. + struct Big { unsigned long long a, b, c, d; }; -// Plain forward: caller(B) musttails callee(B). No agg.tmp copy, no -// byval-temp; the incoming parameter %a is forwarded directly. +// P1: simple forward. struct Big C1(struct Big a); struct Big P1(struct Big a) { [[clang::musttail]] return C1(a); } // COMMON-LABEL: define {{.*}} @_Z2P13Big( // COMMON-NOT: = alloca {{.*}}struct.Big -// COMMON: musttail call {{.*}} @_Z2C13Big({{.*}} %a) +// COMMON: musttail call {{.*}} @_Z2C13Big({{.*}}, ptr {{.*}} %a) -// Two args, same forwarding. +// P2: two distinct args. struct Big C2(struct Big a, struct Big b); struct Big P2(struct Big a, struct Big b) { [[clang::musttail]] return C2(a, b); } // COMMON-LABEL: define {{.*}} @_Z2P23BigS_( -// COMMON-NOT: = alloca {{.*}}struct.Big -// COMMON: musttail call {{.*}} @_Z2C23BigS_({{.*}} %a, {{.*}} %b) +// COMMON-NOT: llvm.memcpy +// COMMON: musttail call {{.*}} @_Z2C23BigS_({{.*}}, ptr {{.*}} %a, ptr {{.*}} %b) -// Swapped args. +// P3: swapped args. struct Big C3(struct Big x, struct Big y); struct Big P3(struct Big a, struct Big b) { [[clang::musttail]] return C3(b, a); } // COMMON-LABEL: define {{.*}} @_Z2P33BigS_( -// COMMON-NOT: = alloca {{.*}}struct.Big -// COMMON: musttail call {{.*}} @_Z2C33BigS_({{.*}} %b, {{.*}} %a) -// Non-trivial copy constructor: the trivial-copy elision must NOT engage. -// Existing path materializes the agg.tmp (the user-defined copy ctor has -// observable behavior). +// P4: non-trivial copy constructor. The trivial-copy gate must NOT engage; +// the user-defined copy ctor IS called. Dangling-stack bug in this corner +// remains (out of scope). struct NonTrivial { unsigned long long parts[4]; NonTrivial(const NonTrivial &); @@ -60,23 +50,18 @@ NonTrivial P4(NonTrivial a) { [[clang::musttail]] return C4(a); } // COMMON-LABEL: define {{.*}} @_Z2P410NonTrivial( -// The user-defined copy ctor IS called (the agg.tmp pattern still happens): // COMMON: call {{.*}} @_ZN10NonTrivialC1ERKS_ -// Caller modifies the parameter before the musttail. The trivial-copy -// elision still engages because the source LValue is still the parameter; -// any mutation flowed through the incoming pointer is observed by the -// forwarded call. +// P5: modify-then-forward. struct Big C5(struct Big a); struct Big P5(struct Big a) { a.a += 1; [[clang::musttail]] return C5(a); } // COMMON-LABEL: define {{.*}} @_Z2P53Big( -// COMMON-NOT: = alloca {{.*}}struct.Big -// COMMON: musttail call {{.*}} @_Z2C53Big({{.*}} %a) +// COMMON: musttail call {{.*}} @_Z2C53Big({{.*}}, ptr {{.*}} %a) -// musttail behind a branch: trivial-copy elision must work across BBs. +// P6: musttail behind a branch. struct Big C6(struct Big a, int cond); struct Big P6(struct Big a, int cond) { if (cond) @@ -84,36 +69,77 @@ struct Big P6(struct Big a, int cond) { return a; } // COMMON-LABEL: define {{.*}} @_Z2P63Bigi( -// COMMON-NOT: = alloca {{.*}}struct.Big -// COMMON: musttail call {{.*}} @_Z2C63Bigi({{.*}} %a, +// COMMON: musttail call {{.*}} @_Z2C63Bigi({{.*}}, ptr {{.*}} %a, -// Same Argument forwarded to two slots: both engage. Incoming Indirect -// params are not noalias under the Linux C++ ABI so the dedup in the -// helper does not fire and both slots forward %a directly. +// P7: same arg to two slots. Slot 0 forwards %a; slot 1 memcpys *%a into the +// i=1 incoming pointer %b and forwards %b. struct Big C7(struct Big x, struct Big y); struct Big P7(struct Big a, struct Big b) { [[clang::musttail]] return C7(a, a); } // COMMON-LABEL: define {{.*}} @_Z2P73BigS_( -// COMMON-NOT: = alloca {{.*}}struct.Big -// COMMON: musttail call {{.*}} @_Z2C73BigS_({{.*}} %a, {{.*}} %a) +// COMMON: llvm.mem{{(cpy|move)}}{{.*}}(ptr {{.*}} %b, ptr {{.*}} %a, +// COMMON: musttail call {{.*}} @_Z2C73BigS_({{.*}}, ptr {{.*}} %a, ptr {{.*}} %b) -// Negative: source is a local, not a parameter. The trivial-copy elision -// must NOT engage; the byval-temp pattern remains. +// P8: local source. Copied into the incoming %a, then %a forwarded. struct Big C8(struct Big a); struct Big P8(struct Big a) { struct Big local = {1, 2, 3, 4}; [[clang::musttail]] return C8(local); } // COMMON-LABEL: define {{.*}} @_Z2P83Big( -// COMMON: = alloca -// COMMON: musttail call {{.*}} @_Z2C83Big( +// COMMON: llvm.mem{{(cpy|move)}}{{.*}}(ptr {{.*}} %a, ptr {{.*}} +// COMMON: musttail call {{.*}} @_Z2C83Big({{.*}}, ptr {{.*}} %a) -// Non-musttail tail call: trivial-copy elision must NOT engage; the regular -// agg.tmp copy is still emitted. +// P9: non-musttail tail call (existing path). struct Big C9(struct Big a); struct Big P9(struct Big a) { return C9(a); } // COMMON-LABEL: define {{.*}} @_Z2P93Big( // COMMON-NOT: musttail + +// P10: mixed direct + indirect. +struct Big C10(int x1, struct Big s1, int x2, struct Big s2); +struct Big P10(int x1, struct Big s1, int x2, struct Big s2) { + [[clang::musttail]] return C10(x1, s1, x2, s2); +} +// COMMON-LABEL: define {{.*}} @_Z3P10i3BigiS_( +// COMMON-NOT: = alloca {{.*}}struct.Big +// COMMON: musttail call {{.*}} @_Z3C10i3BigiS_({{.*}}, i32 {{.*}} %x1, ptr {{.*}} %s1, i32 {{.*}} %x2, ptr {{.*}} %s2) + +// P11: many args (stack spill on the target ABIs above). +struct Big C11(struct Big s1, struct Big s2, struct Big s3, struct Big s4, + struct Big s5, struct Big s6, struct Big s7, struct Big s8, + struct Big s9, struct Big s10); +struct Big P11(struct Big a1, struct Big a2, struct Big a3, struct Big a4, + struct Big a5, struct Big a6, struct Big a7, struct Big a8, + struct Big a9, struct Big a10) { + [[clang::musttail]] return C11(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10); +} +// COMMON-LABEL: define {{.*}} @_Z3P113BigS_S_S_S_S_S_S_S_S_( +// COMMON-NOT: = alloca {{.*}}struct.Big +// COMMON: musttail call {{.*}} @_Z3C113BigS_S_S_S_S_S_S_S_S_( + +// P16: member function. (P15 lambda case skipped: Sema currently rejects +// musttail from a lambda's operator() to a non-member function, #119152.) +struct S { + struct Big f(struct Big a); + struct Big P16(struct Big a); +}; +struct Big S::P16(struct Big a) { + [[clang::musttail]] return f(a); +} +// COMMON-LABEL: define {{.*}} @_ZN1S3P16E3Big( +// COMMON-NOT: = alloca {{.*}}struct.Big +// COMMON: musttail call {{.*}} @_ZN1S1fE3Big({{.*}}, ptr {{.*}}, ptr {{.*}} %a) + +// P17: same arg to three slots (generalization of P7). +struct Big C17(struct Big x, struct Big y, struct Big z); +struct Big P17(struct Big a, struct Big b, struct Big c) { + [[clang::musttail]] return C17(a, a, a); +} +// COMMON-LABEL: define {{.*}} @_Z3P173BigS_S_( +// COMMON: llvm.mem{{(cpy|move)}}{{.*}}(ptr {{.*}} %b, +// COMMON: llvm.mem{{(cpy|move)}}{{.*}}(ptr {{.*}} %c, +// COMMON: musttail call {{.*}} @_Z3C173BigS_S_({{.*}}, ptr {{.*}} %a, ptr {{.*}} %b, ptr {{.*}} %c) _______________________________________________ cfe-commits mailing list [email protected] https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
