Please check whether r237007 fixes the failure for you. --Artem
On Mon, May 11, 2015 at 11:34 AM, Artem Belevich <[email protected]> wrote: > It looks like on s390x strings come with 'align 2'. I'll remove align from > the CHECK constraint and that should fix the test failure on s390x. I'll > commit the fix shortly. > > --Artem > > On Mon, May 11, 2015 at 10:43 AM, Artem Belevich <[email protected]> wrote: > >> Could you send me output of the CC1 executed by the test before it's >> piped into FileCheck? >> >> /scratch/hstong/workdir/Release+Asserts/bin/clang -cc1 -internal-isystem >> /scratch/hstong/workdir/Release+Asserts/bin/../lib/clang/3.7.0/include >> -nostdsysteminc -emit-llvm /gsa/tlbgsa-h1/08/hstong/pub/ >> cfe_trunk/clang/test/CodeGenCUDA/device-stub.cu -fcuda-include-gpubinary >> /gsa/tlbgsa-h1/08/hstong/pub/cfe_trunk/clang/test/CodeGenCUDA/ >> device-stub.cu -o - >> >> Oh, and I see a typo in the script -- "CHEKC: call{{.*}}kernelfunc", >> though it's probably not what breaks the test in your case. >> >> --Artem >> >> On Sat, May 9, 2015 at 11:10 AM, Hubert Tong < >> [email protected]> wrote: >> >>> Hi Artem, >>> >>> I am encountering a failure with device-stub.cu on s390x-suse-linux. >>> Can you take a look? >>> >>> *Output:* >>> FAIL: Clang :: CodeGenCUDA/device-stub.cu (1986 of 21893) >>> ******************** TEST 'Clang :: CodeGenCUDA/device-stub.cu' FAILED >>> ******************** >>> Script: >>> -- >>> /scratch/hstong/workdir/Release+Asserts/bin/clang -cc1 -internal-isystem >>> /scratch/hstong/workdir/Release+Asserts/bin/../lib/clang/3.7.0/include >>> -nostdsysteminc -emit-llvm >>> /gsa/tlbgsa-h1/08/hstong/pub/cfe_trunk/clang/test/CodeGenCUDA/ >>> device-stub.cu -fcuda-include-gpubinary >>> /gsa/tlbgsa-h1/08/hstong/pub/cfe_trunk/clang/test/CodeGenCUDA/ >>> device-stub.cu -o - | >>> /scratch/hstong/workdir/Release+Asserts/bin/FileCheck >>> /gsa/tlbgsa-h1/08/hstong/pub/cfe_trunk/clang/test/CodeGenCUDA/ >>> device-stub.cu >>> -- >>> Exit Code: 1 >>> >>> Command Output (stderr): >>> -- >>> >>> /gsa/tlbgsa-h1/08/hstong/pub/cfe_trunk/clang/test/CodeGenCUDA/device-stub.cu:7:11: >>> error: expected string not found in input >>> // CHECK: private unnamed_addr constant{{.*}}kernelfunc{{.*}}\00", align >>> 1 >>> ^ >>> <stdin>:1:1: note: scanning from here >>> ; ModuleID = >>> '/gsa/tlbgsa-h1/08/hstong/pub/cfe_trunk/clang/test/CodeGenCUDA/ >>> device-stub.cu' >>> ^ >>> <stdin>:13:298: note: possible intended match here >>> @1 = private unnamed_addr constant [2259 x i8] c"// RUN: %clang_cc1 >>> -emit-llvm %s -fcuda-include-gpubinary %s -o - | FileCheck %s\0A\0A#include >>> \22Inputs/cuda.h\22\0A\0A// Make sure that all parts of GPU code >>> init/cleanup are there:\0A// * constant unnamed string with the kernel >>> name\0A// CHECK: private unnamed_addr >>> constant{{.*}}kernelfunc{{.*}}\5C00\22, align 1\0A// * constant unnamed >>> string with GPU binary\0A// CHECK: private unnamed_addr >>> constant{{.*}}\5C00\22\0A// * constant struct that wraps GPU binary\0A// >>> CHECK: @__cuda_fatbin_wrapper = internal constant { i32, i32, i8*, i8* } >>> \0A// CHECK: { i32 1180844977, i32 1, {{.*}}, i8* null }\0A// * variable to >>> save GPU binary handle after initialization\0A// CHECK: >>> @__cuda_gpubin_handle = internal global i8** null\0A// * Make sure our >>> constructor/destructor was added to global ctor/dtor list.\0A// CHECK: >>> @llvm.global_ctors = appending global {{.*}}@__cuda_module_ctor\0A// CHECK: >>> @llvm.global_dtors = appending global {{.*}}@__cuda_module_dtor\0A\0A// >>> Test that we build the correct number of calls to cudaSetupArgument >>> followed\0A// by a call to cudaLaunch.\0A\0A// CHECK: >>> define{{.*}}kernelfunc\0A// CHECK: call{{.*}}cudaSetupArgument\0A// CHECK: >>> call{{.*}}cudaSetupArgument\0A// CHECK: call{{.*}}cudaSetupArgument\0A// >>> CHECK: call{{.*}}cudaLaunch\0A__global__ void kernelfunc(int i, int j, int >>> k) {}\0A\0A// Test that we've built correct kernel launch sequence.\0A// >>> CHECK: define{{.*}}hostfunc\0A// CHECK: call{{.*}}cudaConfigureCall\0A// >>> CHEKC: call{{.*}}kernelfunc\0Avoid hostfunc(void) { kernelfunc<<<1, 1>>>(1, >>> 1, 1); }\0A\0A// Test that we've built a function to register kernels\0A// >>> CHECK: define internal void @__cuda_register_kernels\0A// CHECK: >>> call{{.*}}cudaRegisterFunction(i8** %0, {{.*}}kernelfunc\0A\0A// Test that >>> we've built contructor..\0A// CHECK: define internal void >>> @__cuda_module_ctor\0A// .. that calls >>> __cudaRegisterFatBinary(&__cuda_fatbin_wrapper)\0A// CHECK: >>> call{{.*}}cudaRegisterFatBinary{{.*}}__cuda_fatbin_wrapper\0A// .. stores >>> return value in __cuda_gpubin_handle\0A// CHECK-NEXT: >>> store{{.*}}__cuda_gpubin_handle\0A// .. and then calls >>> __cuda_register_kernels\0A// CHECK-NEXT: call void >>> @__cuda_register_kernels\0A\0A// Test that we've created destructor.\0A// >>> CHECK: define internal void @__cuda_module_dtor\0A// CHECK: >>> load{{.*}}__cuda_gpubin_handle\0A// CHECK-NEXT: call void >>> @__cudaUnregisterFatBinary\0A\0A\00", align 2 >>> >>> ^ >>> >>> -- >>> >>> ******************** >>> >>> *Build environment info:* >>> > g++ -v >>> Using built-in specs. >>> COLLECT_GCC=g++ >>> COLLECT_LTO_WRAPPER=/usr/lib64/gcc/s390x-suse-linux/4.8/lto-wrapper >>> Target: s390x-suse-linux >>> Configured with: ../configure --prefix=/usr --infodir=/usr/share/info >>> --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 >>> --enable-languages=c,c++,objc,fortran,obj-c++,java >>> --enable-checking=release --with-gxx-include-dir=/usr/include/c++/4.8 >>> --enable-ssp --disable-libssp --disable-plugin --with-bugurl= >>> http://bugs.opensuse.org/ --with-pkgversion='SUSE Linux' >>> --disable-libgcj --disable-libmudflap --with-slibdir=/lib64 >>> --with-system-zlib --enable-__cxa_atexit --enable-libstdcxx-allocator=new >>> --disable-libstdcxx-pch --enable-version-specific-runtime-libs >>> --enable-linker-build-id --enable-linux-futex --program-suffix=-4.8 >>> --without-system-libunwind --with-tune=zEC12 --with-arch=z196 >>> --with-long-double-128 --enable-decimal-float --build=s390x-suse-linux >>> --host=s390x-suse-linux >>> Thread model: posix >>> gcc version 4.8.3 20140627 [gcc-4_8-branch revision 212064] (SUSE Linux) >>> >>> Thanks, >>> >>> >>> Hubert Tong >>> >>> On Thu, May 7, 2015 at 2:34 PM, Artem Belevich <[email protected]> wrote: >>> >>>> Author: tra >>>> Date: Thu May 7 14:34:16 2015 >>>> New Revision: 236765 >>>> >>>> URL: http://llvm.org/viewvc/llvm-project?rev=236765&view=rev >>>> Log: >>>> [cuda] Include GPU binary into host object file and generate >>>> init/deinit code. >>>> >>>> - added -fcuda-include-gpubinary option to incorporate results of >>>> device-side compilation into host-side one. >>>> - generate code to register GPU binaries and associated kernels >>>> with CUDA runtime and clean-up on exit. >>>> - added test case for init/deinit code generation. >>>> >>>> Differential Revision: http://reviews.llvm.org/D9507 >>>> >>>> Modified: >>>> cfe/trunk/include/clang/Driver/CC1Options.td >>>> cfe/trunk/include/clang/Frontend/CodeGenOptions.h >>>> cfe/trunk/lib/CodeGen/CGCUDANV.cpp >>>> cfe/trunk/lib/CodeGen/CGCUDARuntime.h >>>> cfe/trunk/lib/CodeGen/CodeGenFunction.cpp >>>> cfe/trunk/lib/CodeGen/CodeGenModule.cpp >>>> cfe/trunk/lib/Frontend/CompilerInvocation.cpp >>>> cfe/trunk/test/CodeGenCUDA/device-stub.cu >>>> >>>> Modified: cfe/trunk/include/clang/Driver/CC1Options.td >>>> URL: >>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Driver/CC1Options.td?rev=236765&r1=236764&r2=236765&view=diff >>>> >>>> ============================================================================== >>>> --- cfe/trunk/include/clang/Driver/CC1Options.td (original) >>>> +++ cfe/trunk/include/clang/Driver/CC1Options.td Thu May 7 14:34:16 >>>> 2015 >>>> @@ -631,6 +631,8 @@ def fcuda_allow_host_calls_from_host_dev >>>> def fcuda_disable_target_call_checks : Flag<["-"], >>>> "fcuda-disable-target-call-checks">, >>>> HelpText<"Disable all cross-target (host, device, etc.) call checks >>>> in CUDA">; >>>> +def fcuda_include_gpubinary : Separate<["-"], >>>> "fcuda-include-gpubinary">, >>>> + HelpText<"Incorporate CUDA device-side binary into host object >>>> file.">; >>>> >>>> } // let Flags = [CC1Option] >>>> >>>> >>>> Modified: cfe/trunk/include/clang/Frontend/CodeGenOptions.h >>>> URL: >>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Frontend/CodeGenOptions.h?rev=236765&r1=236764&r2=236765&view=diff >>>> >>>> ============================================================================== >>>> --- cfe/trunk/include/clang/Frontend/CodeGenOptions.h (original) >>>> +++ cfe/trunk/include/clang/Frontend/CodeGenOptions.h Thu May 7 >>>> 14:34:16 2015 >>>> @@ -163,6 +163,11 @@ public: >>>> /// Name of the profile file to use as input for -fprofile-instr-use >>>> std::string InstrProfileInput; >>>> >>>> + /// A list of file names passed with -fcuda-include-gpubinary >>>> options to >>>> + /// forward to CUDA runtime back-end for incorporating them into >>>> host-side >>>> + /// object file. >>>> + std::vector<std::string> CudaGpuBinaryFileNames; >>>> + >>>> /// Regular expression to select optimizations for which we should >>>> enable >>>> /// optimization remarks. Transformation passes whose name matches >>>> this >>>> /// expression (and support this feature), will emit a diagnostic >>>> >>>> Modified: cfe/trunk/lib/CodeGen/CGCUDANV.cpp >>>> URL: >>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CGCUDANV.cpp?rev=236765&r1=236764&r2=236765&view=diff >>>> >>>> ============================================================================== >>>> --- cfe/trunk/lib/CodeGen/CGCUDANV.cpp (original) >>>> +++ cfe/trunk/lib/CodeGen/CGCUDANV.cpp Thu May 7 14:34:16 2015 >>>> @@ -20,7 +20,6 @@ >>>> #include "llvm/IR/CallSite.h" >>>> #include "llvm/IR/Constants.h" >>>> #include "llvm/IR/DerivedTypes.h" >>>> -#include <vector> >>>> >>>> using namespace clang; >>>> using namespace CodeGen; >>>> @@ -30,29 +29,66 @@ namespace { >>>> class CGNVCUDARuntime : public CGCUDARuntime { >>>> >>>> private: >>>> - llvm::Type *IntTy, *SizeTy; >>>> - llvm::PointerType *CharPtrTy, *VoidPtrTy; >>>> + llvm::Type *IntTy, *SizeTy, *VoidTy; >>>> + llvm::PointerType *CharPtrTy, *VoidPtrTy, *VoidPtrPtrTy; >>>> + >>>> + /// Convenience reference to LLVM Context >>>> + llvm::LLVMContext &Context; >>>> + /// Convenience reference to the current module >>>> + llvm::Module &TheModule; >>>> + /// Keeps track of kernel launch stubs emitted in this module >>>> + llvm::SmallVector<llvm::Function *, 16> EmittedKernels; >>>> + /// Keeps track of variables containing handles of GPU binaries. >>>> Populated by >>>> + /// ModuleCtorFunction() and used to create corresponding cleanup >>>> calls in >>>> + /// ModuleDtorFunction() >>>> + llvm::SmallVector<llvm::GlobalVariable *, 16> GpuBinaryHandles; >>>> >>>> llvm::Constant *getSetupArgumentFn() const; >>>> llvm::Constant *getLaunchFn() const; >>>> >>>> + /// Creates a function to register all kernel stubs generated in >>>> this module. >>>> + llvm::Function *makeRegisterKernelsFn(); >>>> + >>>> + /// Helper function that generates a constant string and returns a >>>> pointer to >>>> + /// the start of the string. The result of this function can be >>>> used anywhere >>>> + /// where the C code specifies const char*. >>>> + llvm::Constant *makeConstantString(const std::string &Str, >>>> + const std::string &Name = "", >>>> + unsigned Alignment = 0) { >>>> + llvm::Constant *Zeros[] = {llvm::ConstantInt::get(SizeTy, 0), >>>> + llvm::ConstantInt::get(SizeTy, 0)}; >>>> + auto *ConstStr = CGM.GetAddrOfConstantCString(Str, Name.c_str()); >>>> + return >>>> llvm::ConstantExpr::getGetElementPtr(ConstStr->getValueType(), >>>> + ConstStr, Zeros); >>>> + } >>>> + >>>> + void emitDeviceStubBody(CodeGenFunction &CGF, FunctionArgList &Args); >>>> + >>>> public: >>>> CGNVCUDARuntime(CodeGenModule &CGM); >>>> >>>> - void EmitDeviceStubBody(CodeGenFunction &CGF, FunctionArgList &Args) >>>> override; >>>> + void emitDeviceStub(CodeGenFunction &CGF, FunctionArgList &Args) >>>> override; >>>> + /// Creates module constructor function >>>> + llvm::Function *makeModuleCtorFunction() override; >>>> + /// Creates module destructor function >>>> + llvm::Function *makeModuleDtorFunction() override; >>>> }; >>>> >>>> } >>>> >>>> -CGNVCUDARuntime::CGNVCUDARuntime(CodeGenModule &CGM) : >>>> CGCUDARuntime(CGM) { >>>> +CGNVCUDARuntime::CGNVCUDARuntime(CodeGenModule &CGM) >>>> + : CGCUDARuntime(CGM), Context(CGM.getLLVMContext()), >>>> + TheModule(CGM.getModule()) { >>>> CodeGen::CodeGenTypes &Types = CGM.getTypes(); >>>> ASTContext &Ctx = CGM.getContext(); >>>> >>>> IntTy = Types.ConvertType(Ctx.IntTy); >>>> SizeTy = Types.ConvertType(Ctx.getSizeType()); >>>> + VoidTy = llvm::Type::getVoidTy(Context); >>>> >>>> CharPtrTy = >>>> llvm::PointerType::getUnqual(Types.ConvertType(Ctx.CharTy)); >>>> VoidPtrTy = >>>> cast<llvm::PointerType>(Types.ConvertType(Ctx.VoidPtrTy)); >>>> + VoidPtrPtrTy = VoidPtrTy->getPointerTo(); >>>> } >>>> >>>> llvm::Constant *CGNVCUDARuntime::getSetupArgumentFn() const { >>>> @@ -68,14 +104,17 @@ llvm::Constant *CGNVCUDARuntime::getSetu >>>> >>>> llvm::Constant *CGNVCUDARuntime::getLaunchFn() const { >>>> // cudaError_t cudaLaunch(char *) >>>> - std::vector<llvm::Type*> Params; >>>> - Params.push_back(CharPtrTy); >>>> - return CGM.CreateRuntimeFunction(llvm::FunctionType::get(IntTy, >>>> - Params, >>>> false), >>>> - "cudaLaunch"); >>>> + return CGM.CreateRuntimeFunction( >>>> + llvm::FunctionType::get(IntTy, CharPtrTy, false), "cudaLaunch"); >>>> +} >>>> + >>>> +void CGNVCUDARuntime::emitDeviceStub(CodeGenFunction &CGF, >>>> + FunctionArgList &Args) { >>>> + EmittedKernels.push_back(CGF.CurFn); >>>> + emitDeviceStubBody(CGF, Args); >>>> } >>>> >>>> -void CGNVCUDARuntime::EmitDeviceStubBody(CodeGenFunction &CGF, >>>> +void CGNVCUDARuntime::emitDeviceStubBody(CodeGenFunction &CGF, >>>> FunctionArgList &Args) { >>>> // Build the argument value list and the argument stack struct type. >>>> SmallVector<llvm::Value *, 16> ArgValues; >>>> @@ -87,8 +126,7 @@ void CGNVCUDARuntime::EmitDeviceStubBody >>>> assert(isa<llvm::PointerType>(V->getType()) && "Arg type not >>>> PointerType"); >>>> >>>> >>>> ArgTypes.push_back(cast<llvm::PointerType>(V->getType())->getElementType()); >>>> } >>>> - llvm::StructType *ArgStackTy = llvm::StructType::get( >>>> - CGF.getLLVMContext(), ArgTypes); >>>> + llvm::StructType *ArgStackTy = llvm::StructType::get(Context, >>>> ArgTypes); >>>> >>>> llvm::BasicBlock *EndBlock = CGF.createBasicBlock("setup.end"); >>>> >>>> @@ -120,6 +158,160 @@ void CGNVCUDARuntime::EmitDeviceStubBody >>>> CGF.EmitBlock(EndBlock); >>>> } >>>> >>>> +/// Creates internal function to register all kernel stubs generated >>>> in this >>>> +/// module with the CUDA runtime. >>>> +/// \code >>>> +/// void __cuda_register_kernels(void** GpuBinaryHandle) { >>>> +/// __cudaRegisterFunction(GpuBinaryHandle,Kernel0,...); >>>> +/// ... >>>> +/// __cudaRegisterFunction(GpuBinaryHandle,KernelM,...); >>>> +/// } >>>> +/// \endcode >>>> +llvm::Function *CGNVCUDARuntime::makeRegisterKernelsFn() { >>>> + llvm::Function *RegisterKernelsFunc = llvm::Function::Create( >>>> + llvm::FunctionType::get(VoidTy, VoidPtrPtrTy, false), >>>> + llvm::GlobalValue::InternalLinkage, "__cuda_register_kernels", >>>> &TheModule); >>>> + llvm::BasicBlock *EntryBB = >>>> + llvm::BasicBlock::Create(Context, "entry", RegisterKernelsFunc); >>>> + CGBuilderTy Builder(Context); >>>> + Builder.SetInsertPoint(EntryBB); >>>> + >>>> + // void __cudaRegisterFunction(void **, const char *, char *, const >>>> char *, >>>> + // int, uint3*, uint3*, dim3*, dim3*, >>>> int*) >>>> + std::vector<llvm::Type *> RegisterFuncParams = { >>>> + VoidPtrPtrTy, CharPtrTy, CharPtrTy, CharPtrTy, IntTy, >>>> + VoidPtrTy, VoidPtrTy, VoidPtrTy, VoidPtrTy, >>>> IntTy->getPointerTo()}; >>>> + llvm::Constant *RegisterFunc = CGM.CreateRuntimeFunction( >>>> + llvm::FunctionType::get(IntTy, RegisterFuncParams, false), >>>> + "__cudaRegisterFunction"); >>>> + >>>> + // Extract GpuBinaryHandle passed as the first argument passed to >>>> + // __cuda_register_kernels() and generate __cudaRegisterFunction() >>>> call for >>>> + // each emitted kernel. >>>> + llvm::Argument &GpuBinaryHandlePtr = >>>> *RegisterKernelsFunc->arg_begin(); >>>> + for (llvm::Function *Kernel : EmittedKernels) { >>>> + llvm::Constant *KernelName = makeConstantString(Kernel->getName()); >>>> + llvm::Constant *NullPtr = >>>> llvm::ConstantPointerNull::get(VoidPtrTy); >>>> + llvm::Value *args[] = { >>>> + &GpuBinaryHandlePtr, Builder.CreateBitCast(Kernel, VoidPtrTy), >>>> + KernelName, KernelName, llvm::ConstantInt::get(IntTy, -1), >>>> NullPtr, >>>> + NullPtr, NullPtr, NullPtr, >>>> + llvm::ConstantPointerNull::get(IntTy->getPointerTo())}; >>>> + Builder.CreateCall(RegisterFunc, args); >>>> + } >>>> + >>>> + Builder.CreateRetVoid(); >>>> + return RegisterKernelsFunc; >>>> +} >>>> + >>>> +/// Creates a global constructor function for the module: >>>> +/// \code >>>> +/// void __cuda_module_ctor(void*) { >>>> +/// Handle0 = __cudaRegisterFatBinary(GpuBinaryBlob0); >>>> +/// __cuda_register_kernels(Handle0); >>>> +/// ... >>>> +/// HandleN = __cudaRegisterFatBinary(GpuBinaryBlobN); >>>> +/// __cuda_register_kernels(HandleN); >>>> +/// } >>>> +/// \endcode >>>> +llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() { >>>> + // void __cuda_register_kernels(void* handle); >>>> + llvm::Function *RegisterKernelsFunc = makeRegisterKernelsFn(); >>>> + // void ** __cudaRegisterFatBinary(void *); >>>> + llvm::Constant *RegisterFatbinFunc = CGM.CreateRuntimeFunction( >>>> + llvm::FunctionType::get(VoidPtrPtrTy, VoidPtrTy, false), >>>> + "__cudaRegisterFatBinary"); >>>> + // struct { int magic, int version, void * gpu_binary, void * >>>> dont_care }; >>>> + llvm::StructType *FatbinWrapperTy = >>>> + llvm::StructType::get(IntTy, IntTy, VoidPtrTy, VoidPtrTy, >>>> nullptr); >>>> + >>>> + llvm::Function *ModuleCtorFunc = llvm::Function::Create( >>>> + llvm::FunctionType::get(VoidTy, VoidPtrTy, false), >>>> + llvm::GlobalValue::InternalLinkage, "__cuda_module_ctor", >>>> &TheModule); >>>> + llvm::BasicBlock *CtorEntryBB = >>>> + llvm::BasicBlock::Create(Context, "entry", ModuleCtorFunc); >>>> + CGBuilderTy CtorBuilder(Context); >>>> + >>>> + CtorBuilder.SetInsertPoint(CtorEntryBB); >>>> + >>>> + // For each GPU binary, register it with the CUDA runtime and store >>>> returned >>>> + // handle in a global variable and save the handle in >>>> GpuBinaryHandles vector >>>> + // to be cleaned up in destructor on exit. Then associate all known >>>> kernels >>>> + // with the GPU binary handle so CUDA runtime can figure out what to >>>> call on >>>> + // the GPU side. >>>> + for (const std::string &GpuBinaryFileName : >>>> + CGM.getCodeGenOpts().CudaGpuBinaryFileNames) { >>>> + llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> GpuBinaryOrErr = >>>> + llvm::MemoryBuffer::getFileOrSTDIN(GpuBinaryFileName); >>>> + if (std::error_code EC = GpuBinaryOrErr.getError()) { >>>> + CGM.getDiags().Report(diag::err_cannot_open_file) << >>>> GpuBinaryFileName >>>> + << >>>> EC.message(); >>>> + continue; >>>> + } >>>> + >>>> + // Create initialized wrapper structure that points to the loaded >>>> GPU binary >>>> + llvm::Constant *Values[] = { >>>> + llvm::ConstantInt::get(IntTy, 0x466243b1), // Fatbin wrapper >>>> magic. >>>> + llvm::ConstantInt::get(IntTy, 1), // Fatbin version. >>>> + makeConstantString(GpuBinaryOrErr.get()->getBuffer(), "", 16), >>>> // Data. >>>> + llvm::ConstantPointerNull::get(VoidPtrTy)}; // Unused in >>>> fatbin v1. >>>> + llvm::GlobalVariable *FatbinWrapper = new llvm::GlobalVariable( >>>> + TheModule, FatbinWrapperTy, true, >>>> llvm::GlobalValue::InternalLinkage, >>>> + llvm::ConstantStruct::get(FatbinWrapperTy, Values), >>>> + "__cuda_fatbin_wrapper"); >>>> + >>>> + // GpuBinaryHandle = __cudaRegisterFatBinary(&FatbinWrapper); >>>> + llvm::CallInst *RegisterFatbinCall = CtorBuilder.CreateCall( >>>> + RegisterFatbinFunc, >>>> + CtorBuilder.CreateBitCast(FatbinWrapper, VoidPtrTy)); >>>> + llvm::GlobalVariable *GpuBinaryHandle = new llvm::GlobalVariable( >>>> + TheModule, VoidPtrPtrTy, false, >>>> llvm::GlobalValue::InternalLinkage, >>>> + llvm::ConstantPointerNull::get(VoidPtrPtrTy), >>>> "__cuda_gpubin_handle"); >>>> + CtorBuilder.CreateStore(RegisterFatbinCall, GpuBinaryHandle, >>>> false); >>>> + >>>> + // Call __cuda_register_kernels(GpuBinaryHandle); >>>> + CtorBuilder.CreateCall(RegisterKernelsFunc, RegisterFatbinCall); >>>> + >>>> + // Save GpuBinaryHandle so we can unregister it in destructor. >>>> + GpuBinaryHandles.push_back(GpuBinaryHandle); >>>> + } >>>> + >>>> + CtorBuilder.CreateRetVoid(); >>>> + return ModuleCtorFunc; >>>> +} >>>> + >>>> +/// Creates a global destructor function that unregisters all GPU code >>>> blobs >>>> +/// registered by constructor. >>>> +/// \code >>>> +/// void __cuda_module_dtor(void*) { >>>> +/// __cudaUnregisterFatBinary(Handle0); >>>> +/// ... >>>> +/// __cudaUnregisterFatBinary(HandleN); >>>> +/// } >>>> +/// \endcode >>>> +llvm::Function *CGNVCUDARuntime::makeModuleDtorFunction() { >>>> + // void __cudaUnregisterFatBinary(void ** handle); >>>> + llvm::Constant *UnregisterFatbinFunc = CGM.CreateRuntimeFunction( >>>> + llvm::FunctionType::get(VoidTy, VoidPtrPtrTy, false), >>>> + "__cudaUnregisterFatBinary"); >>>> + >>>> + llvm::Function *ModuleDtorFunc = llvm::Function::Create( >>>> + llvm::FunctionType::get(VoidTy, VoidPtrTy, false), >>>> + llvm::GlobalValue::InternalLinkage, "__cuda_module_dtor", >>>> &TheModule); >>>> + llvm::BasicBlock *DtorEntryBB = >>>> + llvm::BasicBlock::Create(Context, "entry", ModuleDtorFunc); >>>> + CGBuilderTy DtorBuilder(Context); >>>> + DtorBuilder.SetInsertPoint(DtorEntryBB); >>>> + >>>> + for (llvm::GlobalVariable *GpuBinaryHandle : GpuBinaryHandles) { >>>> + DtorBuilder.CreateCall(UnregisterFatbinFunc, >>>> + DtorBuilder.CreateLoad(GpuBinaryHandle, >>>> false)); >>>> + } >>>> + >>>> + DtorBuilder.CreateRetVoid(); >>>> + return ModuleDtorFunc; >>>> +} >>>> + >>>> CGCUDARuntime *CodeGen::CreateNVCUDARuntime(CodeGenModule &CGM) { >>>> return new CGNVCUDARuntime(CGM); >>>> } >>>> >>>> Modified: cfe/trunk/lib/CodeGen/CGCUDARuntime.h >>>> URL: >>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CGCUDARuntime.h?rev=236765&r1=236764&r2=236765&view=diff >>>> >>>> ============================================================================== >>>> --- cfe/trunk/lib/CodeGen/CGCUDARuntime.h (original) >>>> +++ cfe/trunk/lib/CodeGen/CGCUDARuntime.h Thu May 7 14:34:16 2015 >>>> @@ -16,6 +16,10 @@ >>>> #ifndef LLVM_CLANG_LIB_CODEGEN_CGCUDARUNTIME_H >>>> #define LLVM_CLANG_LIB_CODEGEN_CGCUDARUNTIME_H >>>> >>>> +namespace llvm { >>>> +class Function; >>>> +} >>>> + >>>> namespace clang { >>>> >>>> class CUDAKernelCallExpr; >>>> @@ -39,10 +43,17 @@ public: >>>> virtual RValue EmitCUDAKernelCallExpr(CodeGenFunction &CGF, >>>> const CUDAKernelCallExpr *E, >>>> ReturnValueSlot ReturnValue); >>>> - >>>> - virtual void EmitDeviceStubBody(CodeGenFunction &CGF, >>>> - FunctionArgList &Args) = 0; >>>> >>>> + /// Emits a kernel launch stub. >>>> + virtual void emitDeviceStub(CodeGenFunction &CGF, FunctionArgList >>>> &Args) = 0; >>>> + >>>> + /// Constructs and returns a module initialization function or >>>> nullptr if it's >>>> + /// not needed. Must be called after all kernels have been emitted. >>>> + virtual llvm::Function *makeModuleCtorFunction() = 0; >>>> + >>>> + /// Returns a module cleanup function or nullptr if it's not needed. >>>> + /// Must be called after ModuleCtorFunction >>>> + virtual llvm::Function *makeModuleDtorFunction() = 0; >>>> }; >>>> >>>> /// Creates an instance of a CUDA runtime class. >>>> >>>> Modified: cfe/trunk/lib/CodeGen/CodeGenFunction.cpp >>>> URL: >>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CodeGenFunction.cpp?rev=236765&r1=236764&r2=236765&view=diff >>>> >>>> ============================================================================== >>>> --- cfe/trunk/lib/CodeGen/CodeGenFunction.cpp (original) >>>> +++ cfe/trunk/lib/CodeGen/CodeGenFunction.cpp Thu May 7 14:34:16 2015 >>>> @@ -878,7 +878,7 @@ void CodeGenFunction::GenerateCode(Globa >>>> else if (getLangOpts().CUDA && >>>> !getLangOpts().CUDAIsDevice && >>>> FD->hasAttr<CUDAGlobalAttr>()) >>>> - CGM.getCUDARuntime().EmitDeviceStubBody(*this, Args); >>>> + CGM.getCUDARuntime().emitDeviceStub(*this, Args); >>>> else if (isa<CXXConversionDecl>(FD) && >>>> >>>> cast<CXXConversionDecl>(FD)->isLambdaToBlockPointerConversion()) { >>>> // The lambda conversion to block pointer is special; the >>>> semantics can't be >>>> >>>> Modified: cfe/trunk/lib/CodeGen/CodeGenModule.cpp >>>> URL: >>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CodeGenModule.cpp?rev=236765&r1=236764&r2=236765&view=diff >>>> >>>> ============================================================================== >>>> --- cfe/trunk/lib/CodeGen/CodeGenModule.cpp (original) >>>> +++ cfe/trunk/lib/CodeGen/CodeGenModule.cpp Thu May 7 14:34:16 2015 >>>> @@ -350,6 +350,13 @@ void CodeGenModule::Release() { >>>> if (ObjCRuntime) >>>> if (llvm::Function *ObjCInitFunction = >>>> ObjCRuntime->ModuleInitFunction()) >>>> AddGlobalCtor(ObjCInitFunction); >>>> + if (Context.getLangOpts().CUDA && >>>> !Context.getLangOpts().CUDAIsDevice && >>>> + CUDARuntime) { >>>> + if (llvm::Function *CudaCtorFunction = >>>> CUDARuntime->makeModuleCtorFunction()) >>>> + AddGlobalCtor(CudaCtorFunction); >>>> + if (llvm::Function *CudaDtorFunction = >>>> CUDARuntime->makeModuleDtorFunction()) >>>> + AddGlobalDtor(CudaDtorFunction); >>>> + } >>>> if (PGOReader && PGOStats.hasDiagnostics()) >>>> PGOStats.reportDiagnostics(getDiags(), >>>> getCodeGenOpts().MainFileName); >>>> EmitCtorList(GlobalCtors, "llvm.global_ctors"); >>>> @@ -3678,4 +3685,3 @@ void CodeGenModule::EmitOMPThreadPrivate >>>> CXXGlobalInits.push_back(InitFunction); >>>> } >>>> } >>>> - >>>> >>>> Modified: cfe/trunk/lib/Frontend/CompilerInvocation.cpp >>>> URL: >>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Frontend/CompilerInvocation.cpp?rev=236765&r1=236764&r2=236765&view=diff >>>> >>>> ============================================================================== >>>> --- cfe/trunk/lib/Frontend/CompilerInvocation.cpp (original) >>>> +++ cfe/trunk/lib/Frontend/CompilerInvocation.cpp Thu May 7 14:34:16 >>>> 2015 >>>> @@ -651,6 +651,9 @@ static bool ParseCodeGenArgs(CodeGenOpti >>>> Args.getAllArgValues(OPT_fsanitize_recover_EQ), >>>> Diags, >>>> Opts.SanitizeRecover); >>>> >>>> + Opts.CudaGpuBinaryFileNames = >>>> + Args.getAllArgValues(OPT_fcuda_include_gpubinary); >>>> + >>>> return Success; >>>> } >>>> >>>> >>>> Modified: cfe/trunk/test/CodeGenCUDA/device-stub.cu >>>> URL: >>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenCUDA/device-stub.cu?rev=236765&r1=236764&r2=236765&view=diff >>>> >>>> ============================================================================== >>>> --- cfe/trunk/test/CodeGenCUDA/device-stub.cu (original) >>>> +++ cfe/trunk/test/CodeGenCUDA/device-stub.cu Thu May 7 14:34:16 2015 >>>> @@ -1,7 +1,21 @@ >>>> -// RUN: %clang_cc1 -emit-llvm %s -o - | FileCheck %s >>>> +// RUN: %clang_cc1 -emit-llvm %s -fcuda-include-gpubinary %s -o - | >>>> FileCheck %s >>>> >>>> #include "Inputs/cuda.h" >>>> >>>> +// Make sure that all parts of GPU code init/cleanup are there: >>>> +// * constant unnamed string with the kernel name >>>> +// CHECK: private unnamed_addr constant{{.*}}kernelfunc{{.*}}\00", >>>> align 1 >>>> +// * constant unnamed string with GPU binary >>>> +// CHECK: private unnamed_addr constant{{.*}}\00" >>>> +// * constant struct that wraps GPU binary >>>> +// CHECK: @__cuda_fatbin_wrapper = internal constant { i32, i32, i8*, >>>> i8* } >>>> +// CHECK: { i32 1180844977, i32 1, {{.*}}, i64 0, i64 0), i8* >>>> null } >>>> +// * variable to save GPU binary handle after initialization >>>> +// CHECK: @__cuda_gpubin_handle = internal global i8** null >>>> +// * Make sure our constructor/destructor was added to global >>>> ctor/dtor list. >>>> +// CHECK: @llvm.global_ctors = appending global >>>> {{.*}}@__cuda_module_ctor >>>> +// CHECK: @llvm.global_dtors = appending global >>>> {{.*}}@__cuda_module_dtor >>>> + >>>> // Test that we build the correct number of calls to cudaSetupArgument >>>> followed >>>> // by a call to cudaLaunch. >>>> >>>> @@ -11,3 +25,28 @@ >>>> // CHECK: call{{.*}}cudaSetupArgument >>>> // CHECK: call{{.*}}cudaLaunch >>>> __global__ void kernelfunc(int i, int j, int k) {} >>>> + >>>> +// Test that we've built correct kernel launch sequence. >>>> +// CHECK: define{{.*}}hostfunc >>>> +// CHECK: call{{.*}}cudaConfigureCall >>>> +// CHEKC: call{{.*}}kernelfunc >>>> +void hostfunc(void) { kernelfunc<<<1, 1>>>(1, 1, 1); } >>>> + >>>> +// Test that we've built a function to register kernels >>>> +// CHECK: define internal void @__cuda_register_kernels >>>> +// CHECK: call{{.*}}cudaRegisterFunction(i8** %0, {{.*}}kernelfunc >>>> + >>>> +// Test that we've built contructor.. >>>> +// CHECK: define internal void @__cuda_module_ctor >>>> +// .. that calls __cudaRegisterFatBinary(&__cuda_fatbin_wrapper) >>>> +// CHECK: call{{.*}}cudaRegisterFatBinary{{.*}}__cuda_fatbin_wrapper >>>> +// .. stores return value in __cuda_gpubin_handle >>>> +// CHECK-NEXT: store{{.*}}__cuda_gpubin_handle >>>> +// .. and then calls __cuda_register_kernels >>>> +// CHECK-NEXT: call void @__cuda_register_kernels >>>> + >>>> +// Test that we've created destructor. >>>> +// CHECK: define internal void @__cuda_module_dtor >>>> +// CHECK: load{{.*}}__cuda_gpubin_handle >>>> +// CHECK-NEXT: call void @__cudaUnregisterFatBinary >>>> + >>>> >>>> >>>> _______________________________________________ >>>> cfe-commits mailing list >>>> [email protected] >>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits >>>> >>> >>> >> >> >> -- >> --Artem Belevich >> > > > > -- > --Artem Belevich > -- --Artem Belevich
_______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
