I will try implementing John's suggestions. Thanks.

Sam

From: rjmcc...@apple.com [mailto:rjmcc...@apple.com]
Sent: Saturday, March 10, 2018 12:14 PM
To: Richard Smith <rich...@metafoo.co.uk>
Cc: Liu, Yaxun (Sam) <yaxun....@amd.com>; cfe-commits 
<cfe-commits@lists.llvm.org>
Subject: Re: r326946 - CodeGen: Fix address space of indirect function argument

On Mar 9, 2018, at 8:51 PM, Richard Smith 
<rich...@metafoo.co.uk<mailto:rich...@metafoo.co.uk>> wrote:

Hi,

This change increases the size of a CallArg, and thus that of a CallArgList, 
dramatically (an LValue is *much* larger than an RValue, and unlike an RValue, 
does not appear to be at all optimized for size). This results in CallArgList 
(which contains inline storage for 16 CallArgs) ballooning in size from ~500 
bytes to 2KB, resulting in stack overflows on programs with deep ASTs.

Given that this introduces a regression (due to stack overflow), I've reverted 
in r327195. Sorry about that.

Seems reasonable.

The right short-term fix is probably a combination of the following:
  - put a little bit of effort into packing LValue (using fewer than 64 bits 
for the alignment and packing the bit-fields, I think)
  - drop the inline argument count to something like 8, which should still 
capture the vast majority of calls.

There are quite a few other space optimizations we could do the LValue 
afterwards.   I think we could eliminate BaseIVarExp completely by storing some 
sort of ObjC GC classification and then just evaluating the base expression 
normally.


The program it broke looked something like this:

struct VectorBuilder {
  VectorBuilder &operator,(int);
};
void f() {
  VectorBuilder(),
  
1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,
  
1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,
  
1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,
  
1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,
  
1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,
  
1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,
  
1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,
  
1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,
  
1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,
  
1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,
  
1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,
  
1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,
  
1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,
  
1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,
  
1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,
  
1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,
  
1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,
  
1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,
  
1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,
  
1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,
  
1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,
  
1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,
  
1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,
  
1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,
  
1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,
  
1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,
  
1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,
  
1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,
  
1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,
  
1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,
  
1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,
  
1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,0;
}

using Eigen's somewhat-ridiculous comma initializer technique 
(https://eigen.tuxfamily.org/dox/group__TutorialAdvancedInitialization.html) to 
build an approx 40 x 40 array.

An AST 1600 levels deep is somewhat extreme, but it still seems like something 
we should be able to handle within our default 8MB stack limit.

I mean, at some point this really is not supportable, even if it happens to 
build on current compilers.  If we actually documented and enforced our 
implementation limits like we're supposed to, I do not think we would allow 
this much call nesting.

John.



On 7 March 2018 at 13:45, Yaxun Liu via cfe-commits 
<cfe-commits@lists.llvm.org<mailto:cfe-commits@lists.llvm.org>> wrote:
Author: yaxunl
Date: Wed Mar  7 13:45:40 2018
New Revision: 326946

URL: http://llvm.org/viewvc/llvm-project?rev=326946&view=rev
Log:
CodeGen: Fix address space of indirect function argument

The indirect function argument is in alloca address space in LLVM IR. However,
during Clang codegen for C++, the address space of indirect function argument
should match its address space in the source code, i.e., default addr space, 
even
for indirect argument. This is because destructor of the indirect argument may
be called in the caller function, and address of the indirect argument may be
taken, in either case the indirect function argument is expected to be in 
default
addr space, not the alloca address space.

Therefore, the indirect function argument should be mapped to the temp var
casted to default address space. The caller will cast it to alloca addr space
when passing it to the callee. In the callee, the argument is also casted to the
default address space and used.

CallArg is refactored to facilitate this fix.

Differential Revision: https://reviews.llvm.org/D34367

Added:
    cfe/trunk/test/CodeGenCXX/amdgcn-func-arg.cpp
Modified:
    cfe/trunk/lib/CodeGen/CGAtomic.cpp
    cfe/trunk/lib/CodeGen/CGCall.cpp
    cfe/trunk/lib/CodeGen/CGCall.h
    cfe/trunk/lib/CodeGen/CGClass.cpp
    cfe/trunk/lib/CodeGen/CGDecl.cpp
    cfe/trunk/lib/CodeGen/CGExprCXX.cpp
    cfe/trunk/lib/CodeGen/CGGPUBuiltin.cpp
    cfe/trunk/lib/CodeGen/CGObjCGNU.cpp
    cfe/trunk/lib/CodeGen/CGObjCMac.cpp
    cfe/trunk/lib/CodeGen/CodeGenFunction.h
    cfe/trunk/lib/CodeGen/ItaniumCXXABI.cpp
    cfe/trunk/lib/CodeGen/MicrosoftCXXABI.cpp
    
cfe/trunk/test/CodeGenOpenCL/addr-space-struct-arg.cl<http://addr-space-struct-arg.cl/>
    cfe/trunk/test/CodeGenOpenCL/byval.cl<http://byval.cl/>

Modified: cfe/trunk/lib/CodeGen/CGAtomic.cpp
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CGAtomic.cpp?rev=326946&r1=326945&r2=326946&view=diff
==============================================================================
--- cfe/trunk/lib/CodeGen/CGAtomic.cpp (original)
+++ cfe/trunk/lib/CodeGen/CGAtomic.cpp Wed Mar  7 13:45:40 2018
@@ -1160,7 +1160,7 @@ RValue CodeGenFunction::EmitAtomicExpr(A
     if (UseOptimizedLibcall && Res.getScalarVal()) {
       llvm::Value *ResVal = Res.getScalarVal();
       if (PostOp) {
-        llvm::Value *LoadVal1 = Args[1].RV.getScalarVal();
+        llvm::Value *LoadVal1 = Args[1].getRValue(*this).getScalarVal();
         ResVal = Builder.CreateBinOp(PostOp, ResVal, LoadVal1);
       }
       if (E->getOp() == AtomicExpr::AO__atomic_nand_fetch)

Modified: cfe/trunk/lib/CodeGen/CGCall.cpp
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CGCall.cpp?rev=326946&r1=326945&r2=326946&view=diff
==============================================================================
--- cfe/trunk/lib/CodeGen/CGCall.cpp (original)
+++ cfe/trunk/lib/CodeGen/CGCall.cpp Wed Mar  7 13:45:40 2018
@@ -1040,42 +1040,49 @@ void CodeGenFunction::ExpandTypeFromArgs
 }

 void CodeGenFunction::ExpandTypeToArgs(
-    QualType Ty, RValue RV, llvm::FunctionType *IRFuncTy,
+    QualType Ty, CallArg Arg, llvm::FunctionType *IRFuncTy,
     SmallVectorImpl<llvm::Value *> &IRCallArgs, unsigned &IRCallArgPos) {
   auto Exp = getTypeExpansion(Ty, getContext());
   if (auto CAExp = dyn_cast<ConstantArrayExpansion>(Exp.get())) {
-    forConstantArrayExpansion(*this, CAExp, RV.getAggregateAddress(),
-                              [&](Address EltAddr) {
-      RValue EltRV =
-          convertTempToRValue(EltAddr, CAExp->EltTy, SourceLocation());
-      ExpandTypeToArgs(CAExp->EltTy, EltRV, IRFuncTy, IRCallArgs, 
IRCallArgPos);
-    });
+    Address Addr = Arg.hasLValue() ? Arg.getKnownLValue().getAddress()
+                                   : 
Arg.getKnownRValue().getAggregateAddress();
+    forConstantArrayExpansion(
+        *this, CAExp, Addr, [&](Address EltAddr) {
+          CallArg EltArg = CallArg(
+              convertTempToRValue(EltAddr, CAExp->EltTy, SourceLocation()),
+              CAExp->EltTy);
+          ExpandTypeToArgs(CAExp->EltTy, EltArg, IRFuncTy, IRCallArgs,
+                           IRCallArgPos);
+        });
   } else if (auto RExp = dyn_cast<RecordExpansion>(Exp.get())) {
-    Address This = RV.getAggregateAddress();
+    Address This = Arg.hasLValue() ? Arg.getKnownLValue().getAddress()
+                                   : 
Arg.getKnownRValue().getAggregateAddress();
     for (const CXXBaseSpecifier *BS : RExp->Bases) {
       // Perform a single step derived-to-base conversion.
       Address Base =
           GetAddressOfBaseClass(This, Ty->getAsCXXRecordDecl(), &BS, &BS + 1,
                                 /*NullCheckValue=*/false, SourceLocation());
-      RValue BaseRV = RValue::getAggregate(Base);
+      CallArg BaseArg = CallArg(RValue::getAggregate(Base), BS->getType());

       // Recurse onto bases.
-      ExpandTypeToArgs(BS->getType(), BaseRV, IRFuncTy, IRCallArgs,
+      ExpandTypeToArgs(BS->getType(), BaseArg, IRFuncTy, IRCallArgs,
                        IRCallArgPos);
     }

     LValue LV = MakeAddrLValue(This, Ty);
     for (auto FD : RExp->Fields) {
-      RValue FldRV = EmitRValueForField(LV, FD, SourceLocation());
-      ExpandTypeToArgs(FD->getType(), FldRV, IRFuncTy, IRCallArgs,
+      CallArg FldArg =
+          CallArg(EmitRValueForField(LV, FD, SourceLocation()), FD->getType());
+      ExpandTypeToArgs(FD->getType(), FldArg, IRFuncTy, IRCallArgs,
                        IRCallArgPos);
     }
   } else if (isa<ComplexExpansion>(Exp.get())) {
-    ComplexPairTy CV = RV.getComplexVal();
+    ComplexPairTy CV = Arg.getKnownRValue().getComplexVal();
     IRCallArgs[IRCallArgPos++] = CV.first;
     IRCallArgs[IRCallArgPos++] = CV.second;
   } else {
     assert(isa<NoExpansion>(Exp.get()));
+    auto RV = Arg.getKnownRValue();
     assert(RV.isScalar() &&
            "Unexpected non-scalar rvalue during struct expansion.");

@@ -3418,13 +3425,17 @@ void CodeGenFunction::EmitCallArgs(
     assert(InitialArgSize + 1 == Args.size() &&
            "The code below depends on only adding one arg per EmitCallArg");
     (void)InitialArgSize;
-    RValue RVArg = Args.back().RV;
-    EmitNonNullArgCheck(RVArg, ArgTypes[Idx], (*Arg)->getExprLoc(), AC,
-                        ParamsToSkip + Idx);
-    // @llvm.objectsize should never have side-effects and shouldn't need
-    // destruction/cleanups, so we can safely "emit" it after its arg,
-    // regardless of right-to-leftness
-    MaybeEmitImplicitObjectSize(Idx, *Arg, RVArg);
+    // Since pointer argument are never emitted as LValue, it is safe to emit
+    // non-null argument check for r-value only.
+    if (!Args.back().hasLValue()) {
+      RValue RVArg = Args.back().getKnownRValue();
+      EmitNonNullArgCheck(RVArg, ArgTypes[Idx], (*Arg)->getExprLoc(), AC,
+                          ParamsToSkip + Idx);
+      // @llvm.objectsize should never have side-effects and shouldn't need
+      // destruction/cleanups, so we can safely "emit" it after its arg,
+      // regardless of right-to-leftness
+      MaybeEmitImplicitObjectSize(Idx, *Arg, RVArg);
+    }
   }

   if (!LeftToRight) {
@@ -3471,6 +3482,31 @@ struct DisableDebugLocationUpdates {

 } // end anonymous namespace

+RValue CallArg::getRValue(CodeGenFunction &CGF) const {
+  if (!HasLV)
+    return RV;
+  LValue Copy = CGF.MakeAddrLValue(CGF.CreateMemTemp(Ty), Ty);
+  CGF.EmitAggregateCopy(Copy, LV, Ty, LV.isVolatile());
+  IsUsed = true;
+  return RValue::getAggregate(Copy.getAddress());
+}
+
+void CallArg::copyInto(CodeGenFunction &CGF, Address Addr) const {
+  LValue Dst = CGF.MakeAddrLValue(Addr, Ty);
+  if (!HasLV && RV.isScalar())
+    CGF.EmitStoreOfScalar(RV.getScalarVal(), Dst, /*init=*/true);
+  else if (!HasLV && RV.isComplex())
+    CGF.EmitStoreOfComplex(RV.getComplexVal(), Dst, /*init=*/true);
+  else {
+    auto Addr = HasLV ? LV.getAddress() : RV.getAggregateAddress();
+    LValue SrcLV = CGF.MakeAddrLValue(Addr, Ty);
+    CGF.EmitAggregateCopy(Dst, SrcLV, Ty,
+                          HasLV ? LV.isVolatileQualified()
+                                : RV.isVolatileQualified());
+  }
+  IsUsed = true;
+}
+
 void CodeGenFunction::EmitCallArg(CallArgList &args, const Expr *E,
                                   QualType type) {
   DisableDebugLocationUpdates Dis(*this, E);
@@ -3536,15 +3572,7 @@ void CodeGenFunction::EmitCallArg(CallAr
       cast<CastExpr>(E)->getCastKind() == CK_LValueToRValue) {
     LValue L = EmitLValue(cast<CastExpr>(E)->getSubExpr());
     assert(L.isSimple());
-    if (L.getAlignment() >= getContext().getTypeAlignInChars(type)) {
-      args.add(L.asAggregateRValue(), type, /*NeedsCopy*/true);
-    } else {
-      // We can't represent a misaligned lvalue in the CallArgList, so copy
-      // to an aligned temporary now.
-      LValue Dest = MakeAddrLValue(CreateMemTemp(type), type);
-      EmitAggregateCopy(Dest, L, type, L.isVolatile());
-      args.add(RValue::getAggregate(Dest.getAddress()), type);
-    }
+    args.addUncopiedAggregate(L, type);
     return;
   }

@@ -3702,16 +3730,6 @@ CodeGenFunction::EmitCallOrInvoke(llvm::
   return llvm::CallSite(Inst);
 }

-/// \brief Store a non-aggregate value to an address to initialize it.  For
-/// initialization, a non-atomic store will be used.
-static void EmitInitStoreOfNonAggregate(CodeGenFunction &CGF, RValue Src,
-                                        LValue Dst) {
-  if (Src.isScalar())
-    CGF.EmitStoreOfScalar(Src.getScalarVal(), Dst, /*init=*/true);
-  else
-    CGF.EmitStoreOfComplex(Src.getComplexVal(), Dst, /*init=*/true);
-}
-
 void CodeGenFunction::deferPlaceholderReplacement(llvm::Instruction *Old,
                                                   llvm::Value *New) {
   DeferredReplacements.push_back(std::make_pair(Old, New));
@@ -3804,7 +3822,6 @@ RValue CodeGenFunction::EmitCall(const C
   for (CallArgList::const_iterator I = CallArgs.begin(), E = CallArgs.end();
        I != E; ++I, ++info_it, ++ArgNo) {
     const ABIArgInfo &ArgInfo = info_it->info;
-    RValue RV = I->RV;

     // Insert a padding argument to ensure proper alignment.
     if (IRFunctionArgs.hasPaddingArg(ArgNo))
@@ -3818,13 +3835,16 @@ RValue CodeGenFunction::EmitCall(const C
     case ABIArgInfo::InAlloca: {
       assert(NumIRArgs == 0);
       assert(getTarget().getTriple().getArch() == llvm::Triple::x86);
-      if (RV.isAggregate()) {
+      if (I->isAggregate()) {
         // Replace the placeholder with the appropriate argument slot GEP.
+        Address Addr = I->hasLValue()
+                           ? I->getKnownLValue().getAddress()
+                           : I->getKnownRValue().getAggregateAddress();
         llvm::Instruction *Placeholder =
-            cast<llvm::Instruction>(RV.getAggregatePointer());
+            cast<llvm::Instruction>(Addr.getPointer());
         CGBuilderTy::InsertPoint IP = Builder.saveIP();
         Builder.SetInsertPoint(Placeholder);
-        Address Addr = 
createInAllocaStructGEP(ArgInfo.getInAllocaFieldIndex());
+        Addr = createInAllocaStructGEP(ArgInfo.getInAllocaFieldIndex());
         Builder.restoreIP(IP);
         deferPlaceholderReplacement(Placeholder, Addr.getPointer());
       } else {
@@ -3837,22 +3857,20 @@ RValue CodeGenFunction::EmitCall(const C
         // from {}* to (%struct.foo*)*.
         if (Addr.getType() != MemType)
           Addr = Builder.CreateBitCast(Addr, MemType);
-        LValue argLV = MakeAddrLValue(Addr, I->Ty);
-        EmitInitStoreOfNonAggregate(*this, RV, argLV);
+        I->copyInto(*this, Addr);
       }
       break;
     }

     case ABIArgInfo::Indirect: {
       assert(NumIRArgs == 1);
-      if (RV.isScalar() || RV.isComplex()) {
+      if (!I->isAggregate()) {
         // Make a temporary alloca to pass the argument.
         Address Addr = CreateMemTemp(I->Ty, ArgInfo.getIndirectAlign(),
                                      "indirect-arg-temp", false);
         IRCallArgs[FirstIRArg] = Addr.getPointer();

-        LValue argLV = MakeAddrLValue(Addr, I->Ty);
-        EmitInitStoreOfNonAggregate(*this, RV, argLV);
+        I->copyInto(*this, Addr);
       } else {
         // We want to avoid creating an unnecessary temporary+copy here;
         // however, we need one in three cases:
@@ -3860,32 +3878,51 @@ RValue CodeGenFunction::EmitCall(const C
         //    source.  (This case doesn't occur on any common architecture.)
         // 2. If the argument is byval, RV is not sufficiently aligned, and
         //    we cannot force it to be sufficiently aligned.
-        // 3. If the argument is byval, but RV is located in an address space
-        //    different than that of the argument (0).
-        Address Addr = RV.getAggregateAddress();
+        // 3. If the argument is byval, but RV is not located in default
+        //    or alloca address space.
+        Address Addr = I->hasLValue()
+                           ? I->getKnownLValue().getAddress()
+                           : I->getKnownRValue().getAggregateAddress();
+        llvm::Value *V = Addr.getPointer();
         CharUnits Align = ArgInfo.getIndirectAlign();
         const llvm::DataLayout *TD = &CGM.getDataLayout();
-        const unsigned RVAddrSpace = Addr.getType()->getAddressSpace();
-        const unsigned ArgAddrSpace =
-            (FirstIRArg < IRFuncTy->getNumParams()
-                 ? IRFuncTy->getParamType(FirstIRArg)->getPointerAddressSpace()
-                 : 0);
-        if ((!ArgInfo.getIndirectByVal() && I->NeedsCopy) ||
-            (ArgInfo.getIndirectByVal() && Addr.getAlignment() < Align &&
-             llvm::getOrEnforceKnownAlignment(Addr.getPointer(),
-                                              Align.getQuantity(), *TD)
-               < Align.getQuantity()) ||
-            (ArgInfo.getIndirectByVal() && (RVAddrSpace != ArgAddrSpace))) {
+
+        assert((FirstIRArg >= IRFuncTy->getNumParams() ||
+                IRFuncTy->getParamType(FirstIRArg)->getPointerAddressSpace() ==
+                    TD->getAllocaAddrSpace()) &&
+               "indirect argument must be in alloca address space");
+
+        bool NeedCopy = false;
+
+        if (Addr.getAlignment() < Align &&
+            llvm::getOrEnforceKnownAlignment(V, Align.getQuantity(), *TD) <
+                Align.getQuantity()) {
+          NeedCopy = true;
+        } else if (I->hasLValue()) {
+          auto LV = I->getKnownLValue();
+          auto AS = LV.getAddressSpace();
+          if ((!ArgInfo.getIndirectByVal() &&
+               (LV.getAlignment() >=
+                getContext().getTypeAlignInChars(I->Ty))) ||
+              (ArgInfo.getIndirectByVal() &&
+               ((AS != LangAS::Default && AS != LangAS::opencl_private &&
+                 AS != CGM.getASTAllocaAddressSpace())))) {
+            NeedCopy = true;
+          }
+        }
+        if (NeedCopy) {
           // Create an aligned temporary, and copy to it.
           Address AI = CreateMemTemp(I->Ty, ArgInfo.getIndirectAlign(),
                                      "byval-temp", false);
           IRCallArgs[FirstIRArg] = AI.getPointer();
-          LValue Dest = MakeAddrLValue(AI, I->Ty);
-          LValue Src = MakeAddrLValue(Addr, I->Ty);
-          EmitAggregateCopy(Dest, Src, I->Ty, RV.isVolatileQualified());
+          I->copyInto(*this, AI);
         } else {
           // Skip the extra memcpy call.
-          IRCallArgs[FirstIRArg] = Addr.getPointer();
+          auto *T = V->getType()->getPointerElementType()->getPointerTo(
+              CGM.getDataLayout().getAllocaAddrSpace());
+          IRCallArgs[FirstIRArg] = getTargetHooks().performAddrSpaceCast(
+              *this, V, LangAS::Default, CGM.getASTAllocaAddressSpace(), T,
+              true);
         }
       }
       break;
@@ -3902,10 +3939,12 @@ RValue CodeGenFunction::EmitCall(const C
           ArgInfo.getDirectOffset() == 0) {
         assert(NumIRArgs == 1);
         llvm::Value *V;
-        if (RV.isScalar())
-          V = RV.getScalarVal();
+        if (!I->isAggregate())
+          V = I->getKnownRValue().getScalarVal();
         else
-          V = Builder.CreateLoad(RV.getAggregateAddress());
+          V = Builder.CreateLoad(
+              I->hasLValue() ? I->getKnownLValue().getAddress()
+                             : I->getKnownRValue().getAggregateAddress());

         // Implement swifterror by copying into a new swifterror argument.
         // We'll write back in the normal path out of the call.
@@ -3943,12 +3982,12 @@ RValue CodeGenFunction::EmitCall(const C

       // FIXME: Avoid the conversion through memory if possible.
       Address Src = Address::invalid();
-      if (RV.isScalar() || RV.isComplex()) {
+      if (!I->isAggregate()) {
         Src = CreateMemTemp(I->Ty, "coerce");
-        LValue SrcLV = MakeAddrLValue(Src, I->Ty);
-        EmitInitStoreOfNonAggregate(*this, RV, SrcLV);
+        I->copyInto(*this, Src);
       } else {
-        Src = RV.getAggregateAddress();
+        Src = I->hasLValue() ? I->getKnownLValue().getAddress()
+                             : I->getKnownRValue().getAggregateAddress();
       }

       // If the value is offset in memory, apply the offset now.
@@ -4002,9 +4041,12 @@ RValue CodeGenFunction::EmitCall(const C

       llvm::Value *tempSize = nullptr;
       Address addr = Address::invalid();
-      if (RV.isAggregate()) {
-        addr = RV.getAggregateAddress();
+      if (I->isAggregate()) {
+        addr = I->hasLValue() ? I->getKnownLValue().getAddress()
+                              : I->getKnownRValue().getAggregateAddress();
+
       } else {
+        RValue RV = I->getKnownRValue();
         assert(RV.isScalar()); // complex should always just be direct

         llvm::Type *scalarType = RV.getScalarVal()->getType();
@@ -4041,7 +4083,7 @@ RValue CodeGenFunction::EmitCall(const C

     case ABIArgInfo::Expand:
       unsigned IRArgPos = FirstIRArg;
-      ExpandTypeToArgs(I->Ty, RV, IRFuncTy, IRCallArgs, IRArgPos);
+      ExpandTypeToArgs(I->Ty, *I, IRFuncTy, IRCallArgs, IRArgPos);
       assert(IRArgPos == FirstIRArg + NumIRArgs);
       break;
     }
@@ -4393,7 +4435,7 @@ RValue CodeGenFunction::EmitCall(const C
                               OffsetValue);
     } else if (const auto *AA = TargetDecl->getAttr<AllocAlignAttr>()) {
       llvm::Value *ParamVal =
-          CallArgs[AA->getParamIndex() - 1].RV.getScalarVal();
+          CallArgs[AA->getParamIndex() - 1].getRValue(*this).getScalarVal();
       EmitAlignmentAssumption(Ret.getScalarVal(), ParamVal);
     }
   }

Modified: cfe/trunk/lib/CodeGen/CGCall.h
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CGCall.h?rev=326946&r1=326945&r2=326946&view=diff
==============================================================================
--- cfe/trunk/lib/CodeGen/CGCall.h (original)
+++ cfe/trunk/lib/CodeGen/CGCall.h Wed Mar  7 13:45:40 2018
@@ -213,12 +213,46 @@ public:
   };

   struct CallArg {
-    RValue RV;
+  private:
+    union {
+      RValue RV;
+      LValue LV; /// The argument is semantically a load from this l-value.
+    };
+    bool HasLV;
+
+    /// A data-flow flag to make sure getRValue and/or copyInto are not
+    /// called twice for duplicated IR emission.
+    mutable bool IsUsed;
+
+  public:
     QualType Ty;
-    bool NeedsCopy;
-    CallArg(RValue rv, QualType ty, bool needscopy)
-    : RV(rv), Ty(ty), NeedsCopy(needscopy)
-    { }
+    CallArg(RValue rv, QualType ty)
+        : RV(rv), HasLV(false), IsUsed(false), Ty(ty) {}
+    CallArg(LValue lv, QualType ty)
+        : LV(lv), HasLV(true), IsUsed(false), Ty(ty) {}
+    bool hasLValue() const { return HasLV; }
+    QualType getType() const { return Ty; }
+
+    /// \returns an independent RValue. If the CallArg contains an LValue,
+    /// a temporary copy is returned.
+    RValue getRValue(CodeGenFunction &CGF) const;
+
+    LValue getKnownLValue() const {
+      assert(HasLV && !IsUsed);
+      return LV;
+    }
+    RValue getKnownRValue() const {
+      assert(!HasLV && !IsUsed);
+      return RV;
+    }
+    void setRValue(RValue _RV) {
+      assert(!HasLV);
+      RV = _RV;
+    }
+
+    bool isAggregate() const { return HasLV || RV.isAggregate(); }
+
+    void copyInto(CodeGenFunction &CGF, Address A) const;
   };

   /// CallArgList - Type for representing both the value and type of
@@ -248,8 +282,10 @@ public:
       llvm::Instruction *IsActiveIP;
     };

-    void add(RValue rvalue, QualType type, bool needscopy = false) {
-      push_back(CallArg(rvalue, type, needscopy));
+    void add(RValue rvalue, QualType type) { push_back(CallArg(rvalue, type)); 
}
+
+    void addUncopiedAggregate(LValue LV, QualType type) {
+      push_back(CallArg(LV, type));
     }

     /// Add all the arguments from another CallArgList to this one. After doing

Modified: cfe/trunk/lib/CodeGen/CGClass.cpp
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CGClass.cpp?rev=326946&r1=326945&r2=326946&view=diff
==============================================================================
--- cfe/trunk/lib/CodeGen/CGClass.cpp (original)
+++ cfe/trunk/lib/CodeGen/CGClass.cpp Wed Mar  7 13:45:40 2018
@@ -2077,7 +2077,8 @@ void CodeGenFunction::EmitCXXConstructor
     assert(Args.size() == 2 && "unexpected argcount for trivial ctor");

     QualType SrcTy = D->getParamDecl(0)->getType().getNonReferenceType();
-    Address Src(Args[1].RV.getScalarVal(), getNaturalTypeAlignment(SrcTy));
+    Address Src(Args[1].getRValue(*this).getScalarVal(),
+                getNaturalTypeAlignment(SrcTy));
     LValue SrcLVal = MakeAddrLValue(Src, SrcTy);
     QualType DestTy = getContext().getTypeDeclType(ClassDecl);
     LValue DestLVal = MakeAddrLValue(This, DestTy);
@@ -2131,8 +2132,7 @@ void CodeGenFunction::EmitInheritedCXXCo
     const CXXConstructorDecl *D, bool ForVirtualBase, Address This,
     bool InheritedFromVBase, const CXXInheritedCtorInitExpr *E) {
   CallArgList Args;
-  CallArg ThisArg(RValue::get(This.getPointer()), D->getThisType(getContext()),
-                  /*NeedsCopy=*/false);
+  CallArg ThisArg(RValue::get(This.getPointer()), 
D->getThisType(getContext()));

   // Forward the parameters.
   if (InheritedFromVBase &&
@@ -2196,7 +2196,7 @@ void CodeGenFunction::EmitInlinedInherit
   assert(Args.size() >= Params.size() && "too few arguments for call");
   for (unsigned I = 0, N = Args.size(); I != N; ++I) {
     if (I < Params.size() && isa<ImplicitParamDecl>(Params[I])) {
-      const RValue &RV = Args[I].RV;
+      const RValue &RV = Args[I].getRValue(*this);
       assert(!RV.isComplex() && "complex indirect params not supported");
       ParamValue Val = RV.isScalar()
                            ? ParamValue::forDirect(RV.getScalarVal())

Modified: cfe/trunk/lib/CodeGen/CGDecl.cpp
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CGDecl.cpp?rev=326946&r1=326945&r2=326946&view=diff
==============================================================================
--- cfe/trunk/lib/CodeGen/CGDecl.cpp (original)
+++ cfe/trunk/lib/CodeGen/CGDecl.cpp Wed Mar  7 13:45:40 2018
@@ -1882,6 +1882,22 @@ void CodeGenFunction::EmitParmDecl(const
     llvm::Type *IRTy = ConvertTypeForMem(Ty)->getPointerTo(AS);
     if (DeclPtr.getType() != IRTy)
       DeclPtr = Builder.CreateBitCast(DeclPtr, IRTy, D.getName());
+    // Indirect argument is in alloca address space, which may be different
+    // from the default address space.
+    auto AllocaAS = CGM.getASTAllocaAddressSpace();
+    auto *V = DeclPtr.getPointer();
+    auto SrcLangAS = getLangOpts().OpenCL ? LangAS::opencl_private : AllocaAS;
+    auto DestLangAS =
+        getLangOpts().OpenCL ? LangAS::opencl_private : LangAS::Default;
+    if (SrcLangAS != DestLangAS) {
+      assert(getContext().getTargetAddressSpace(SrcLangAS) ==
+             CGM.getDataLayout().getAllocaAddrSpace());
+      auto DestAS = getContext().getTargetAddressSpace(DestLangAS);
+      auto *T = V->getType()->getPointerElementType()->getPointerTo(DestAS);
+      DeclPtr = Address(getTargetHooks().performAddrSpaceCast(
+                            *this, V, SrcLangAS, DestLangAS, T, true),
+                        DeclPtr.getAlignment());
+    }

     // Push a destructor cleanup for this parameter if the ABI requires it.
     // Don't push a cleanup in a thunk for a method that will also emit a

Modified: cfe/trunk/lib/CodeGen/CGExprCXX.cpp
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CGExprCXX.cpp?rev=326946&r1=326945&r2=326946&view=diff
==============================================================================
--- cfe/trunk/lib/CodeGen/CGExprCXX.cpp (original)
+++ cfe/trunk/lib/CodeGen/CGExprCXX.cpp Wed Mar  7 13:45:40 2018
@@ -265,7 +265,7 @@ RValue CodeGenFunction::EmitCXXMemberOrO
         // when it isn't necessary; just produce the proper effect here.
         LValue RHS = isa<CXXOperatorCallExpr>(CE)
                          ? MakeNaturalAlignAddrLValue(
-                               (*RtlArgs)[0].RV.getScalarVal(),
+                               (*RtlArgs)[0].getRValue(*this).getScalarVal(),
                                (*(CE->arg_begin() + 1))->getType())
                          : EmitLValue(*CE->arg_begin());
         EmitAggregateAssign(This, RHS, CE->getType());
@@ -1490,7 +1490,7 @@ static void EnterNewDeleteCleanup(CodeGe
                                            AllocAlign);
     for (unsigned I = 0, N = E->getNumPlacementArgs(); I != N; ++I) {
       auto &Arg = NewArgs[I + NumNonPlacementArgs];
-      Cleanup->setPlacementArg(I, Arg.RV, Arg.Ty);
+      Cleanup->setPlacementArg(I, Arg.getRValue(CGF), Arg.Ty);
     }

     return;
@@ -1521,8 +1521,8 @@ static void EnterNewDeleteCleanup(CodeGe
                                               AllocAlign);
   for (unsigned I = 0, N = E->getNumPlacementArgs(); I != N; ++I) {
     auto &Arg = NewArgs[I + NumNonPlacementArgs];
-    Cleanup->setPlacementArg(I, DominatingValue<RValue>::save(CGF, Arg.RV),
-                             Arg.Ty);
+    Cleanup->setPlacementArg(
+        I, DominatingValue<RValue>::save(CGF, Arg.getRValue(CGF)), Arg.Ty);
   }

   CGF.initFullExprCleanup();

Modified: cfe/trunk/lib/CodeGen/CGGPUBuiltin.cpp
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CGGPUBuiltin.cpp?rev=326946&r1=326945&r2=326946&view=diff
==============================================================================
--- cfe/trunk/lib/CodeGen/CGGPUBuiltin.cpp (original)
+++ cfe/trunk/lib/CodeGen/CGGPUBuiltin.cpp Wed Mar  7 13:45:40 2018
@@ -83,8 +83,9 @@ CodeGenFunction::EmitNVPTXDevicePrintfCa
                /* ParamsToSkip = */ 0);

   // We don't know how to emit non-scalar varargs.
-  if (std::any_of(Args.begin() + 1, Args.end(),
-                  [](const CallArg &A) { return !A.RV.isScalar(); })) {
+  if (std::any_of(Args.begin() + 1, Args.end(), [&](const CallArg &A) {
+        return !A.getRValue(*this).isScalar();
+      })) {
     CGM.ErrorUnsupported(E, "non-scalar arg to printf");
     return RValue::get(llvm::ConstantInt::get(IntTy, 0));
   }
@@ -97,7 +98,7 @@ CodeGenFunction::EmitNVPTXDevicePrintfCa
   } else {
     llvm::SmallVector<llvm::Type *, 8> ArgTypes;
     for (unsigned I = 1, NumArgs = Args.size(); I < NumArgs; ++I)
-      ArgTypes.push_back(Args[I].RV.getScalarVal()->getType());
+      ArgTypes.push_back(Args[I].getRValue(*this).getScalarVal()->getType());

     // Using llvm::StructType is correct only because printf doesn't accept
     // aggregates.  If we had to handle aggregates here, we'd have to manually
@@ -109,7 +110,7 @@ CodeGenFunction::EmitNVPTXDevicePrintfCa

     for (unsigned I = 1, NumArgs = Args.size(); I < NumArgs; ++I) {
       llvm::Value *P = Builder.CreateStructGEP(AllocaTy, Alloca, I - 1);
-      llvm::Value *Arg = Args[I].RV.getScalarVal();
+      llvm::Value *Arg = Args[I].getRValue(*this).getScalarVal();
       Builder.CreateAlignedStore(Arg, P, 
DL.getPrefTypeAlignment(Arg->getType()));
     }
     BufferPtr = Builder.CreatePointerCast(Alloca, 
llvm::Type::getInt8PtrTy(Ctx));
@@ -117,6 +118,6 @@ CodeGenFunction::EmitNVPTXDevicePrintfCa

   // Invoke vprintf and return.
   llvm::Function* VprintfFunc = GetVprintfDeclaration(CGM.getModule());
-  return RValue::get(
-      Builder.CreateCall(VprintfFunc, {Args[0].RV.getScalarVal(), BufferPtr}));
+  return RValue::get(Builder.CreateCall(
+      VprintfFunc, {Args[0].getRValue(*this).getScalarVal(), BufferPtr}));
 }

Modified: cfe/trunk/lib/CodeGen/CGObjCGNU.cpp
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CGObjCGNU.cpp?rev=326946&r1=326945&r2=326946&view=diff
==============================================================================
--- cfe/trunk/lib/CodeGen/CGObjCGNU.cpp (original)
+++ cfe/trunk/lib/CodeGen/CGObjCGNU.cpp Wed Mar  7 13:45:40 2018
@@ -1441,7 +1441,7 @@ CGObjCGNU::GenerateMessageSend(CodeGenFu
   }

   // Reset the receiver in case the lookup modified it
-  ActualArgs[0] = CallArg(RValue::get(Receiver), ASTIdTy, false);
+  ActualArgs[0] = CallArg(RValue::get(Receiver), ASTIdTy);

   imp = EnforceType(Builder, imp, MSI.MessengerType);


Modified: cfe/trunk/lib/CodeGen/CGObjCMac.cpp
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CGObjCMac.cpp?rev=326946&r1=326945&r2=326946&view=diff
==============================================================================
--- cfe/trunk/lib/CodeGen/CGObjCMac.cpp (original)
+++ cfe/trunk/lib/CodeGen/CGObjCMac.cpp Wed Mar  7 13:45:40 2018
@@ -1708,7 +1708,7 @@ struct NullReturnState {
            e = Method->param_end(); i != e; ++i, ++I) {
         const ParmVarDecl *ParamDecl = (*i);
         if (ParamDecl->hasAttr<NSConsumedAttr>()) {
-          RValue RV = I->RV;
+          RValue RV = I->getRValue(CGF);
           assert(RV.isScalar() &&
                  "NullReturnState::complete - arg not on object");
           CGF.EmitARCRelease(RV.getScalarVal(), ARCImpreciseLifetime);
@@ -7071,7 +7071,7 @@ CGObjCNonFragileABIMac::EmitVTableMessag
             CGF.getPointerAlign());

   // Update the message ref argument.
-  args[1].RV = RValue::get(mref.getPointer());
+  args[1].setRValue(RValue::get(mref.getPointer()));

   // Load the function to call from the message ref table.
   Address calleeAddr =

Modified: cfe/trunk/lib/CodeGen/CodeGenFunction.h
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CodeGenFunction.h?rev=326946&r1=326945&r2=326946&view=diff
==============================================================================
--- cfe/trunk/lib/CodeGen/CodeGenFunction.h (original)
+++ cfe/trunk/lib/CodeGen/CodeGenFunction.h Wed Mar  7 13:45:40 2018
@@ -3899,10 +3899,10 @@ private:
   void ExpandTypeFromArgs(QualType Ty, LValue Dst,
                           SmallVectorImpl<llvm::Value *>::iterator &AI);

-  /// ExpandTypeToArgs - Expand an RValue \arg RV, with the LLVM type for \arg
+  /// ExpandTypeToArgs - Expand an CallArg \arg Arg, with the LLVM type for 
\arg
   /// Ty, into individual arguments on the provided vector \arg IRCallArgs,
   /// starting at index \arg IRCallArgPos. See ABIArgInfo::Expand.
-  void ExpandTypeToArgs(QualType Ty, RValue RV, llvm::FunctionType *IRFuncTy,
+  void ExpandTypeToArgs(QualType Ty, CallArg Arg, llvm::FunctionType *IRFuncTy,
                         SmallVectorImpl<llvm::Value *> &IRCallArgs,
                         unsigned &IRCallArgPos);


Modified: cfe/trunk/lib/CodeGen/ItaniumCXXABI.cpp
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/ItaniumCXXABI.cpp?rev=326946&r1=326945&r2=326946&view=diff
==============================================================================
--- cfe/trunk/lib/CodeGen/ItaniumCXXABI.cpp (original)
+++ cfe/trunk/lib/CodeGen/ItaniumCXXABI.cpp Wed Mar  7 13:45:40 2018
@@ -1474,8 +1474,7 @@ CGCXXABI::AddedStructorArgs ItaniumCXXAB
   llvm::Value *VTT =
       CGF.GetVTTParameter(GlobalDecl(D, Type), ForVirtualBase, Delegating);
   QualType VTTTy = getContext().getPointerType(getContext().VoidPtrTy);
-  Args.insert(Args.begin() + 1,
-              CallArg(RValue::get(VTT), VTTTy, /*needscopy=*/false));
+  Args.insert(Args.begin() + 1, CallArg(RValue::get(VTT), VTTTy));
   return AddedStructorArgs::prefix(1);  // Added one arg.
 }


Modified: cfe/trunk/lib/CodeGen/MicrosoftCXXABI.cpp
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/MicrosoftCXXABI.cpp?rev=326946&r1=326945&r2=326946&view=diff
==============================================================================
--- cfe/trunk/lib/CodeGen/MicrosoftCXXABI.cpp (original)
+++ cfe/trunk/lib/CodeGen/MicrosoftCXXABI.cpp Wed Mar  7 13:45:40 2018
@@ -1537,8 +1537,7 @@ CGCXXABI::AddedStructorArgs MicrosoftCXX
   }
   RValue RV = RValue::get(MostDerivedArg);
   if (FPT->isVariadic()) {
-    Args.insert(Args.begin() + 1,
-                CallArg(RV, getContext().IntTy, /*needscopy=*/false));
+    Args.insert(Args.begin() + 1, CallArg(RV, getContext().IntTy));
     return AddedStructorArgs::prefix(1);
   }
   Args.add(RV, getContext().IntTy);

Added: cfe/trunk/test/CodeGenCXX/amdgcn-func-arg.cpp
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenCXX/amdgcn-func-arg.cpp?rev=326946&view=auto
==============================================================================
--- cfe/trunk/test/CodeGenCXX/amdgcn-func-arg.cpp (added)
+++ cfe/trunk/test/CodeGenCXX/amdgcn-func-arg.cpp Wed Mar  7 13:45:40 2018
@@ -0,0 +1,94 @@
+// RUN: %clang_cc1 -O0 -triple amdgcn -emit-llvm %s -o - | FileCheck %s
+
+class A {
+public:
+  int x;
+  A():x(0) {}
+  ~A() {}
+};
+
+class B {
+int x[100];
+};
+
+A g_a;
+B g_b;
+
+void func_with_ref_arg(A &a);
+void func_with_ref_arg(B &b);
+
+// CHECK-LABEL: define void @_Z22func_with_indirect_arg1A(%class.A 
addrspace(5)* %a)
+// CHECK:  %p = alloca %class.A*, align 8, addrspace(5)
+// CHECK:  %[[r1:.+]] = addrspacecast %class.A* addrspace(5)* %p to %class.A**
+// CHECK:  %[[r0:.+]] = addrspacecast %class.A addrspace(5)* %a to %class.A*
+// CHECK:  store %class.A* %[[r0]], %class.A** %[[r1]], align 8
+void func_with_indirect_arg(A a) {
+  A *p = &a;
+}
+
+// CHECK-LABEL: define void @_Z22test_indirect_arg_autov()
+// CHECK:  %a = alloca %class.A, align 4, addrspace(5)
+// CHECK:  %[[r0:.+]] = addrspacecast %class.A addrspace(5)* %a to %class.A*
+// CHECK:  %agg.tmp = alloca %class.A, align 4, addrspace(5)
+// CHECK:  %[[r1:.+]] = addrspacecast %class.A addrspace(5)* %agg.tmp to 
%class.A*
+// CHECK:  call void @_ZN1AC1Ev(%class.A* %[[r0]])
+// CHECK:  call void @llvm.memcpy.p0i8.p0i8.i64
+// CHECK:  %[[r4:.+]] = addrspacecast %class.A* %[[r1]] to %class.A 
addrspace(5)*
+// CHECK:  call void @_Z22func_with_indirect_arg1A(%class.A addrspace(5)* 
%[[r4]])
+// CHECK:  call void @_ZN1AD1Ev(%class.A* %[[r1]])
+// CHECK:  call void @_Z17func_with_ref_argR1A(%class.A* dereferenceable(4) 
%[[r0]])
+// CHECK:  call void @_ZN1AD1Ev(%class.A* %[[r0]])
+void test_indirect_arg_auto() {
+  A a;
+  func_with_indirect_arg(a);
+  func_with_ref_arg(a);
+}
+
+// CHECK: define void @_Z24test_indirect_arg_globalv()
+// CHECK:  %agg.tmp = alloca %class.A, align 4, addrspace(5)
+// CHECK:  %[[r0:.+]] = addrspacecast %class.A addrspace(5)* %agg.tmp to 
%class.A*
+// CHECK:  call void @llvm.memcpy.p0i8.p0i8.i64
+// CHECK:  %[[r2:.+]] = addrspacecast %class.A* %[[r0]] to %class.A 
addrspace(5)*
+// CHECK:  call void @_Z22func_with_indirect_arg1A(%class.A addrspace(5)* 
%[[r2]])
+// CHECK:  call void @_ZN1AD1Ev(%class.A* %[[r0]])
+// CHECK:  call void @_Z17func_with_ref_argR1A(%class.A* dereferenceable(4) 
addrspacecast (%class.A addrspace(1)* @g_a to %class.A*))
+void test_indirect_arg_global() {
+  func_with_indirect_arg(g_a);
+  func_with_ref_arg(g_a);
+}
+
+// CHECK-LABEL: define void @_Z19func_with_byval_arg1B(%class.B addrspace(5)* 
byval align 4 %b)
+// CHECK:  %p = alloca %class.B*, align 8, addrspace(5)
+// CHECK:  %[[r1:.+]] = addrspacecast %class.B* addrspace(5)* %p to %class.B**
+// CHECK:  %[[r0:.+]] = addrspacecast %class.B addrspace(5)* %b to %class.B*
+// CHECK:  store %class.B* %[[r0]], %class.B** %[[r1]], align 8
+void func_with_byval_arg(B b) {
+  B *p = &b;
+}
+
+// CHECK-LABEL: define void @_Z19test_byval_arg_autov()
+// CHECK:  %b = alloca %class.B, align 4, addrspace(5)
+// CHECK:  %[[r0:.+]] = addrspacecast %class.B addrspace(5)* %b to %class.B*
+// CHECK:  %agg.tmp = alloca %class.B, align 4, addrspace(5)
+// CHECK:  %[[r1:.+]] = addrspacecast %class.B addrspace(5)* %agg.tmp to 
%class.B*
+// CHECK:  call void @llvm.memcpy.p0i8.p0i8.i64
+// CHECK:  %[[r4:.+]] = addrspacecast %class.B* %[[r1]] to %class.B 
addrspace(5)*
+// CHECK:  call void @_Z19func_with_byval_arg1B(%class.B addrspace(5)* byval 
align 4 %[[r4]])
+// CHECK:  call void @_Z17func_with_ref_argR1B(%class.B* dereferenceable(400) 
%[[r0]])
+void test_byval_arg_auto() {
+  B b;
+  func_with_byval_arg(b);
+  func_with_ref_arg(b);
+}
+
+// CHECK-LABEL: define void @_Z21test_byval_arg_globalv()
+// CHECK:  %agg.tmp = alloca %class.B, align 4, addrspace(5)
+// CHECK:  %[[r0:.+]] = addrspacecast %class.B addrspace(5)* %agg.tmp to 
%class.B*
+// CHECK:  call void @llvm.memcpy.p0i8.p0i8.i64
+// CHECK:  %[[r2:.+]] = addrspacecast %class.B* %[[r0]] to %class.B 
addrspace(5)*
+// CHECK:  call void @_Z19func_with_byval_arg1B(%class.B addrspace(5)* byval 
align 4 %[[r2]])
+// CHECK:  call void @_Z17func_with_ref_argR1B(%class.B* dereferenceable(400) 
addrspacecast (%class.B addrspace(1)* @g_b to %class.B*))
+void test_byval_arg_global() {
+  func_with_byval_arg(g_b);
+  func_with_ref_arg(g_b);
+}

Modified: 
cfe/trunk/test/CodeGenOpenCL/addr-space-struct-arg.cl<http://addr-space-struct-arg.cl/>
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenOpenCL/addr-space-struct-arg.cl?rev=326946&r1=326945&r2=326946&view=diff
==============================================================================
--- 
cfe/trunk/test/CodeGenOpenCL/addr-space-struct-arg.cl<http://addr-space-struct-arg.cl/>
 (original)
+++ 
cfe/trunk/test/CodeGenOpenCL/addr-space-struct-arg.cl<http://addr-space-struct-arg.cl/>
 Wed Mar  7 13:45:40 2018
@@ -1,5 +1,6 @@
 // RUN: %clang_cc1 %s -emit-llvm -o - -O0 -finclude-default-header 
-ffake-address-space-map -triple i686-pc-darwin | FileCheck -enable-var-scope 
-check-prefixes=COM,X86 %s
-// RUN: %clang_cc1 %s -emit-llvm -o - -O0 -finclude-default-header -triple 
amdgcn-amdhsa-amd | FileCheck -enable-var-scope -check-prefixes=COM,AMDGCN %s
+// RUN: %clang_cc1 %s -emit-llvm -o - -O0 -finclude-default-header -triple 
amdgcn | FileCheck -enable-var-scope -check-prefixes=COM,AMDGCN %s
+// RUN: %clang_cc1 %s -emit-llvm -o - -cl-std=CL2.0 -O0 
-finclude-default-header -triple amdgcn | FileCheck -enable-var-scope 
-check-prefixes=COM,AMDGCN,AMDGCN20 %s

 typedef struct {
   int cells[9];
@@ -35,6 +36,9 @@ struct LargeStructTwoMember {
   int2 y[20];
 };

+#if __OPENCL_C_VERSION__ >= 200
+struct LargeStructOneMember g_s;
+#endif

 // X86-LABEL: define void @foo(%struct.Mat4X4* noalias sret %agg.result, 
%struct.Mat3X3* byval align 4 %in)
 // AMDGCN-LABEL: define %struct.Mat4X4 @foo([9 x i32] %in.coerce)
@@ -80,10 +84,42 @@ void FuncOneMember(struct StructOneMembe
 }

 // AMDGCN-LABEL: define void @FuncOneLargeMember(%struct.LargeStructOneMember 
addrspace(5)* byval align 8 %u)
+// AMDGCN-NOT: addrspacecast
+// AMDGCN:   store <2 x i32> %{{.*}}, <2 x i32> addrspace(5)*
 void FuncOneLargeMember(struct LargeStructOneMember u) {
   u.x[0] = (int2)(0, 0);
 }

+// AMDGCN20-LABEL: define void @test_indirect_arg_globl()
+// AMDGCN20:  %[[byval_temp:.*]] = alloca %struct.LargeStructOneMember, align 
8, addrspace(5)
+// AMDGCN20:  %[[r0:.*]] = bitcast %struct.LargeStructOneMember addrspace(5)* 
%[[byval_temp]] to i8 addrspace(5)*
+// AMDGCN20:  call void @llvm.memcpy.p5i8.p1i8.i64(i8 addrspace(5)* align 8 
%[[r0]], i8 addrspace(1)* align 8 bitcast (%struct.LargeStructOneMember 
addrspace(1)* @g_s to i8 addrspace(1)*), i64 800, i1 false)
+// AMDGCN20:  call void @FuncOneLargeMember(%struct.LargeStructOneMember 
addrspace(5)* byval align 8 %[[byval_temp]])
+#if __OPENCL_C_VERSION__ >= 200
+void test_indirect_arg_globl(void) {
+  FuncOneLargeMember(g_s);
+}
+#endif
+
+// AMDGCN-LABEL: define amdgpu_kernel void @test_indirect_arg_local()
+// AMDGCN: %[[byval_temp:.*]] = alloca %struct.LargeStructOneMember, align 8, 
addrspace(5)
+// AMDGCN: %[[r0:.*]] = bitcast %struct.LargeStructOneMember addrspace(5)* 
%[[byval_temp]] to i8 addrspace(5)*
+// AMDGCN: call void @llvm.memcpy.p5i8.p3i8.i64(i8 addrspace(5)* align 8 
%[[r0]], i8 addrspace(3)* align 8 bitcast (%struct.LargeStructOneMember 
addrspace(3)* @test_indirect_arg_local.l_s to i8 addrspace(3)*), i64 800, i1 
false)
+// AMDGCN: call void @FuncOneLargeMember(%struct.LargeStructOneMember 
addrspace(5)* byval align 8 %[[byval_temp]])
+kernel void test_indirect_arg_local(void) {
+  local struct LargeStructOneMember l_s;
+  FuncOneLargeMember(l_s);
+}
+
+// AMDGCN-LABEL: define void @test_indirect_arg_private()
+// AMDGCN: %[[p_s:.*]] = alloca %struct.LargeStructOneMember, align 8, 
addrspace(5)
+// AMDGCN-NOT: @llvm.memcpy
+// AMDGCN-NEXT: call void @FuncOneLargeMember(%struct.LargeStructOneMember 
addrspace(5)* byval align 8 %[[p_s]])
+void test_indirect_arg_private(void) {
+  struct LargeStructOneMember p_s;
+  FuncOneLargeMember(p_s);
+}
+
 // AMDGCN-LABEL: define amdgpu_kernel void @KernelOneMember
 // AMDGCN-SAME:  (<2 x i32> %[[u_coerce:.*]])
 // AMDGCN:  %[[u:.*]] = alloca %struct.StructOneMember, align 8, addrspace(5)
@@ -112,7 +148,6 @@ void FuncLargeTwoMember(struct LargeStru
   u.y[0] = (int2)(0, 0);
 }

-
 // AMDGCN-LABEL: define amdgpu_kernel void @KernelTwoMember
 // AMDGCN-SAME:  (%struct.StructTwoMember %[[u_coerce:.*]])
 // AMDGCN:  %[[u:.*]] = alloca %struct.StructTwoMember, align 8, addrspace(5)

Modified: cfe/trunk/test/CodeGenOpenCL/byval.cl<http://byval.cl/>
URL: 
http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenOpenCL/byval.cl?rev=326946&r1=326945&r2=326946&view=diff
==============================================================================
--- cfe/trunk/test/CodeGenOpenCL/byval.cl<http://byval.cl/> (original)
+++ cfe/trunk/test/CodeGenOpenCL/byval.cl<http://byval.cl/> Wed Mar  7 13:45:40 
2018
@@ -1,5 +1,4 @@
 // RUN: %clang_cc1 -emit-llvm -o - -triple amdgcn %s | FileCheck %s
-// RUN: %clang_cc1 -emit-llvm -o - -triple amdgcn---opencl %s | FileCheck %s

 struct A {
   int x[100];


_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org<mailto:cfe-commits@lists.llvm.org>
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to