Hi Artem, The issue seems to be resolved now.
Thanks, Hubert Tong On Mon, May 11, 2015 at 1:40 PM, Artem Belevich <[email protected]> wrote: > Please check whether r237007 fixes the failure for you. > > --Artem > > On Mon, May 11, 2015 at 11:34 AM, Artem Belevich <[email protected]> wrote: > >> It looks like on s390x strings come with 'align 2'. I'll remove align >> from the CHECK constraint and that should fix the test failure on s390x. >> I'll commit the fix shortly. >> >> --Artem >> >> On Mon, May 11, 2015 at 10:43 AM, Artem Belevich <[email protected]> wrote: >> >>> Could you send me output of the CC1 executed by the test before it's >>> piped into FileCheck? >>> >>> /scratch/hstong/workdir/Release+Asserts/bin/clang -cc1 >>> -internal-isystem >>> /scratch/hstong/workdir/Release+Asserts/bin/../lib/clang/3.7.0/include >>> -nostdsysteminc -emit-llvm /gsa/tlbgsa-h1/08/hstong/pub/ >>> cfe_trunk/clang/test/CodeGenCUDA/device-stub.cu -fcuda-include-gpubinary >>> /gsa/tlbgsa-h1/08/hstong/pub/cfe_trunk/clang/test/CodeGenCUDA/ >>> device-stub.cu -o - >>> >>> Oh, and I see a typo in the script -- "CHEKC: call{{.*}}kernelfunc", >>> though it's probably not what breaks the test in your case. >>> >>> --Artem >>> >>> On Sat, May 9, 2015 at 11:10 AM, Hubert Tong < >>> [email protected]> wrote: >>> >>>> Hi Artem, >>>> >>>> I am encountering a failure with device-stub.cu on s390x-suse-linux. >>>> Can you take a look? >>>> >>>> *Output:* >>>> FAIL: Clang :: CodeGenCUDA/device-stub.cu (1986 of 21893) >>>> ******************** TEST 'Clang :: CodeGenCUDA/device-stub.cu' FAILED >>>> ******************** >>>> Script: >>>> -- >>>> /scratch/hstong/workdir/Release+Asserts/bin/clang -cc1 >>>> -internal-isystem >>>> /scratch/hstong/workdir/Release+Asserts/bin/../lib/clang/3.7.0/include >>>> -nostdsysteminc -emit-llvm >>>> /gsa/tlbgsa-h1/08/hstong/pub/cfe_trunk/clang/test/CodeGenCUDA/ >>>> device-stub.cu -fcuda-include-gpubinary >>>> /gsa/tlbgsa-h1/08/hstong/pub/cfe_trunk/clang/test/CodeGenCUDA/ >>>> device-stub.cu -o - | >>>> /scratch/hstong/workdir/Release+Asserts/bin/FileCheck >>>> /gsa/tlbgsa-h1/08/hstong/pub/cfe_trunk/clang/test/CodeGenCUDA/ >>>> device-stub.cu >>>> -- >>>> Exit Code: 1 >>>> >>>> Command Output (stderr): >>>> -- >>>> >>>> /gsa/tlbgsa-h1/08/hstong/pub/cfe_trunk/clang/test/CodeGenCUDA/device-stub.cu:7:11: >>>> error: expected string not found in input >>>> // CHECK: private unnamed_addr constant{{.*}}kernelfunc{{.*}}\00", >>>> align 1 >>>> ^ >>>> <stdin>:1:1: note: scanning from here >>>> ; ModuleID = >>>> '/gsa/tlbgsa-h1/08/hstong/pub/cfe_trunk/clang/test/CodeGenCUDA/ >>>> device-stub.cu' >>>> ^ >>>> <stdin>:13:298: note: possible intended match here >>>> @1 = private unnamed_addr constant [2259 x i8] c"// RUN: %clang_cc1 >>>> -emit-llvm %s -fcuda-include-gpubinary %s -o - | FileCheck %s\0A\0A#include >>>> \22Inputs/cuda.h\22\0A\0A// Make sure that all parts of GPU code >>>> init/cleanup are there:\0A// * constant unnamed string with the kernel >>>> name\0A// CHECK: private unnamed_addr >>>> constant{{.*}}kernelfunc{{.*}}\5C00\22, align 1\0A// * constant unnamed >>>> string with GPU binary\0A// CHECK: private unnamed_addr >>>> constant{{.*}}\5C00\22\0A// * constant struct that wraps GPU binary\0A// >>>> CHECK: @__cuda_fatbin_wrapper = internal constant { i32, i32, i8*, i8* } >>>> \0A// CHECK: { i32 1180844977, i32 1, {{.*}}, i8* null }\0A// * variable to >>>> save GPU binary handle after initialization\0A// CHECK: >>>> @__cuda_gpubin_handle = internal global i8** null\0A// * Make sure our >>>> constructor/destructor was added to global ctor/dtor list.\0A// CHECK: >>>> @llvm.global_ctors = appending global {{.*}}@__cuda_module_ctor\0A// CHECK: >>>> @llvm.global_dtors = appending global {{.*}}@__cuda_module_dtor\0A\0A// >>>> Test that we build the correct number of calls to cudaSetupArgument >>>> followed\0A// by a call to cudaLaunch.\0A\0A// CHECK: >>>> define{{.*}}kernelfunc\0A// CHECK: call{{.*}}cudaSetupArgument\0A// CHECK: >>>> call{{.*}}cudaSetupArgument\0A// CHECK: call{{.*}}cudaSetupArgument\0A// >>>> CHECK: call{{.*}}cudaLaunch\0A__global__ void kernelfunc(int i, int j, int >>>> k) {}\0A\0A// Test that we've built correct kernel launch sequence.\0A// >>>> CHECK: define{{.*}}hostfunc\0A// CHECK: call{{.*}}cudaConfigureCall\0A// >>>> CHEKC: call{{.*}}kernelfunc\0Avoid hostfunc(void) { kernelfunc<<<1, 1>>>(1, >>>> 1, 1); }\0A\0A// Test that we've built a function to register kernels\0A// >>>> CHECK: define internal void @__cuda_register_kernels\0A// CHECK: >>>> call{{.*}}cudaRegisterFunction(i8** %0, {{.*}}kernelfunc\0A\0A// Test that >>>> we've built contructor..\0A// CHECK: define internal void >>>> @__cuda_module_ctor\0A// .. that calls >>>> __cudaRegisterFatBinary(&__cuda_fatbin_wrapper)\0A// CHECK: >>>> call{{.*}}cudaRegisterFatBinary{{.*}}__cuda_fatbin_wrapper\0A// .. stores >>>> return value in __cuda_gpubin_handle\0A// CHECK-NEXT: >>>> store{{.*}}__cuda_gpubin_handle\0A// .. and then calls >>>> __cuda_register_kernels\0A// CHECK-NEXT: call void >>>> @__cuda_register_kernels\0A\0A// Test that we've created destructor.\0A// >>>> CHECK: define internal void @__cuda_module_dtor\0A// CHECK: >>>> load{{.*}}__cuda_gpubin_handle\0A// CHECK-NEXT: call void >>>> @__cudaUnregisterFatBinary\0A\0A\00", align 2 >>>> >>>> ^ >>>> >>>> -- >>>> >>>> ******************** >>>> >>>> *Build environment info:* >>>> > g++ -v >>>> Using built-in specs. >>>> COLLECT_GCC=g++ >>>> COLLECT_LTO_WRAPPER=/usr/lib64/gcc/s390x-suse-linux/4.8/lto-wrapper >>>> Target: s390x-suse-linux >>>> Configured with: ../configure --prefix=/usr --infodir=/usr/share/info >>>> --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 >>>> --enable-languages=c,c++,objc,fortran,obj-c++,java >>>> --enable-checking=release --with-gxx-include-dir=/usr/include/c++/4.8 >>>> --enable-ssp --disable-libssp --disable-plugin --with-bugurl= >>>> http://bugs.opensuse.org/ --with-pkgversion='SUSE Linux' >>>> --disable-libgcj --disable-libmudflap --with-slibdir=/lib64 >>>> --with-system-zlib --enable-__cxa_atexit --enable-libstdcxx-allocator=new >>>> --disable-libstdcxx-pch --enable-version-specific-runtime-libs >>>> --enable-linker-build-id --enable-linux-futex --program-suffix=-4.8 >>>> --without-system-libunwind --with-tune=zEC12 --with-arch=z196 >>>> --with-long-double-128 --enable-decimal-float --build=s390x-suse-linux >>>> --host=s390x-suse-linux >>>> Thread model: posix >>>> gcc version 4.8.3 20140627 [gcc-4_8-branch revision 212064] (SUSE >>>> Linux) >>>> >>>> Thanks, >>>> >>>> >>>> Hubert Tong >>>> >>>> On Thu, May 7, 2015 at 2:34 PM, Artem Belevich <[email protected]> wrote: >>>> >>>>> Author: tra >>>>> Date: Thu May 7 14:34:16 2015 >>>>> New Revision: 236765 >>>>> >>>>> URL: http://llvm.org/viewvc/llvm-project?rev=236765&view=rev >>>>> Log: >>>>> [cuda] Include GPU binary into host object file and generate >>>>> init/deinit code. >>>>> >>>>> - added -fcuda-include-gpubinary option to incorporate results of >>>>> device-side compilation into host-side one. >>>>> - generate code to register GPU binaries and associated kernels >>>>> with CUDA runtime and clean-up on exit. >>>>> - added test case for init/deinit code generation. >>>>> >>>>> Differential Revision: http://reviews.llvm.org/D9507 >>>>> >>>>> Modified: >>>>> cfe/trunk/include/clang/Driver/CC1Options.td >>>>> cfe/trunk/include/clang/Frontend/CodeGenOptions.h >>>>> cfe/trunk/lib/CodeGen/CGCUDANV.cpp >>>>> cfe/trunk/lib/CodeGen/CGCUDARuntime.h >>>>> cfe/trunk/lib/CodeGen/CodeGenFunction.cpp >>>>> cfe/trunk/lib/CodeGen/CodeGenModule.cpp >>>>> cfe/trunk/lib/Frontend/CompilerInvocation.cpp >>>>> cfe/trunk/test/CodeGenCUDA/device-stub.cu >>>>> >>>>> Modified: cfe/trunk/include/clang/Driver/CC1Options.td >>>>> URL: >>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Driver/CC1Options.td?rev=236765&r1=236764&r2=236765&view=diff >>>>> >>>>> ============================================================================== >>>>> --- cfe/trunk/include/clang/Driver/CC1Options.td (original) >>>>> +++ cfe/trunk/include/clang/Driver/CC1Options.td Thu May 7 14:34:16 >>>>> 2015 >>>>> @@ -631,6 +631,8 @@ def fcuda_allow_host_calls_from_host_dev >>>>> def fcuda_disable_target_call_checks : Flag<["-"], >>>>> "fcuda-disable-target-call-checks">, >>>>> HelpText<"Disable all cross-target (host, device, etc.) call checks >>>>> in CUDA">; >>>>> +def fcuda_include_gpubinary : Separate<["-"], >>>>> "fcuda-include-gpubinary">, >>>>> + HelpText<"Incorporate CUDA device-side binary into host object >>>>> file.">; >>>>> >>>>> } // let Flags = [CC1Option] >>>>> >>>>> >>>>> Modified: cfe/trunk/include/clang/Frontend/CodeGenOptions.h >>>>> URL: >>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Frontend/CodeGenOptions.h?rev=236765&r1=236764&r2=236765&view=diff >>>>> >>>>> ============================================================================== >>>>> --- cfe/trunk/include/clang/Frontend/CodeGenOptions.h (original) >>>>> +++ cfe/trunk/include/clang/Frontend/CodeGenOptions.h Thu May 7 >>>>> 14:34:16 2015 >>>>> @@ -163,6 +163,11 @@ public: >>>>> /// Name of the profile file to use as input for -fprofile-instr-use >>>>> std::string InstrProfileInput; >>>>> >>>>> + /// A list of file names passed with -fcuda-include-gpubinary >>>>> options to >>>>> + /// forward to CUDA runtime back-end for incorporating them into >>>>> host-side >>>>> + /// object file. >>>>> + std::vector<std::string> CudaGpuBinaryFileNames; >>>>> + >>>>> /// Regular expression to select optimizations for which we should >>>>> enable >>>>> /// optimization remarks. Transformation passes whose name matches >>>>> this >>>>> /// expression (and support this feature), will emit a diagnostic >>>>> >>>>> Modified: cfe/trunk/lib/CodeGen/CGCUDANV.cpp >>>>> URL: >>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CGCUDANV.cpp?rev=236765&r1=236764&r2=236765&view=diff >>>>> >>>>> ============================================================================== >>>>> --- cfe/trunk/lib/CodeGen/CGCUDANV.cpp (original) >>>>> +++ cfe/trunk/lib/CodeGen/CGCUDANV.cpp Thu May 7 14:34:16 2015 >>>>> @@ -20,7 +20,6 @@ >>>>> #include "llvm/IR/CallSite.h" >>>>> #include "llvm/IR/Constants.h" >>>>> #include "llvm/IR/DerivedTypes.h" >>>>> -#include <vector> >>>>> >>>>> using namespace clang; >>>>> using namespace CodeGen; >>>>> @@ -30,29 +29,66 @@ namespace { >>>>> class CGNVCUDARuntime : public CGCUDARuntime { >>>>> >>>>> private: >>>>> - llvm::Type *IntTy, *SizeTy; >>>>> - llvm::PointerType *CharPtrTy, *VoidPtrTy; >>>>> + llvm::Type *IntTy, *SizeTy, *VoidTy; >>>>> + llvm::PointerType *CharPtrTy, *VoidPtrTy, *VoidPtrPtrTy; >>>>> + >>>>> + /// Convenience reference to LLVM Context >>>>> + llvm::LLVMContext &Context; >>>>> + /// Convenience reference to the current module >>>>> + llvm::Module &TheModule; >>>>> + /// Keeps track of kernel launch stubs emitted in this module >>>>> + llvm::SmallVector<llvm::Function *, 16> EmittedKernels; >>>>> + /// Keeps track of variables containing handles of GPU binaries. >>>>> Populated by >>>>> + /// ModuleCtorFunction() and used to create corresponding cleanup >>>>> calls in >>>>> + /// ModuleDtorFunction() >>>>> + llvm::SmallVector<llvm::GlobalVariable *, 16> GpuBinaryHandles; >>>>> >>>>> llvm::Constant *getSetupArgumentFn() const; >>>>> llvm::Constant *getLaunchFn() const; >>>>> >>>>> + /// Creates a function to register all kernel stubs generated in >>>>> this module. >>>>> + llvm::Function *makeRegisterKernelsFn(); >>>>> + >>>>> + /// Helper function that generates a constant string and returns a >>>>> pointer to >>>>> + /// the start of the string. The result of this function can be >>>>> used anywhere >>>>> + /// where the C code specifies const char*. >>>>> + llvm::Constant *makeConstantString(const std::string &Str, >>>>> + const std::string &Name = "", >>>>> + unsigned Alignment = 0) { >>>>> + llvm::Constant *Zeros[] = {llvm::ConstantInt::get(SizeTy, 0), >>>>> + llvm::ConstantInt::get(SizeTy, 0)}; >>>>> + auto *ConstStr = CGM.GetAddrOfConstantCString(Str, Name.c_str()); >>>>> + return >>>>> llvm::ConstantExpr::getGetElementPtr(ConstStr->getValueType(), >>>>> + ConstStr, Zeros); >>>>> + } >>>>> + >>>>> + void emitDeviceStubBody(CodeGenFunction &CGF, FunctionArgList >>>>> &Args); >>>>> + >>>>> public: >>>>> CGNVCUDARuntime(CodeGenModule &CGM); >>>>> >>>>> - void EmitDeviceStubBody(CodeGenFunction &CGF, FunctionArgList >>>>> &Args) override; >>>>> + void emitDeviceStub(CodeGenFunction &CGF, FunctionArgList &Args) >>>>> override; >>>>> + /// Creates module constructor function >>>>> + llvm::Function *makeModuleCtorFunction() override; >>>>> + /// Creates module destructor function >>>>> + llvm::Function *makeModuleDtorFunction() override; >>>>> }; >>>>> >>>>> } >>>>> >>>>> -CGNVCUDARuntime::CGNVCUDARuntime(CodeGenModule &CGM) : >>>>> CGCUDARuntime(CGM) { >>>>> +CGNVCUDARuntime::CGNVCUDARuntime(CodeGenModule &CGM) >>>>> + : CGCUDARuntime(CGM), Context(CGM.getLLVMContext()), >>>>> + TheModule(CGM.getModule()) { >>>>> CodeGen::CodeGenTypes &Types = CGM.getTypes(); >>>>> ASTContext &Ctx = CGM.getContext(); >>>>> >>>>> IntTy = Types.ConvertType(Ctx.IntTy); >>>>> SizeTy = Types.ConvertType(Ctx.getSizeType()); >>>>> + VoidTy = llvm::Type::getVoidTy(Context); >>>>> >>>>> CharPtrTy = >>>>> llvm::PointerType::getUnqual(Types.ConvertType(Ctx.CharTy)); >>>>> VoidPtrTy = >>>>> cast<llvm::PointerType>(Types.ConvertType(Ctx.VoidPtrTy)); >>>>> + VoidPtrPtrTy = VoidPtrTy->getPointerTo(); >>>>> } >>>>> >>>>> llvm::Constant *CGNVCUDARuntime::getSetupArgumentFn() const { >>>>> @@ -68,14 +104,17 @@ llvm::Constant *CGNVCUDARuntime::getSetu >>>>> >>>>> llvm::Constant *CGNVCUDARuntime::getLaunchFn() const { >>>>> // cudaError_t cudaLaunch(char *) >>>>> - std::vector<llvm::Type*> Params; >>>>> - Params.push_back(CharPtrTy); >>>>> - return CGM.CreateRuntimeFunction(llvm::FunctionType::get(IntTy, >>>>> - Params, >>>>> false), >>>>> - "cudaLaunch"); >>>>> + return CGM.CreateRuntimeFunction( >>>>> + llvm::FunctionType::get(IntTy, CharPtrTy, false), "cudaLaunch"); >>>>> +} >>>>> + >>>>> +void CGNVCUDARuntime::emitDeviceStub(CodeGenFunction &CGF, >>>>> + FunctionArgList &Args) { >>>>> + EmittedKernels.push_back(CGF.CurFn); >>>>> + emitDeviceStubBody(CGF, Args); >>>>> } >>>>> >>>>> -void CGNVCUDARuntime::EmitDeviceStubBody(CodeGenFunction &CGF, >>>>> +void CGNVCUDARuntime::emitDeviceStubBody(CodeGenFunction &CGF, >>>>> FunctionArgList &Args) { >>>>> // Build the argument value list and the argument stack struct type. >>>>> SmallVector<llvm::Value *, 16> ArgValues; >>>>> @@ -87,8 +126,7 @@ void CGNVCUDARuntime::EmitDeviceStubBody >>>>> assert(isa<llvm::PointerType>(V->getType()) && "Arg type not >>>>> PointerType"); >>>>> >>>>> >>>>> ArgTypes.push_back(cast<llvm::PointerType>(V->getType())->getElementType()); >>>>> } >>>>> - llvm::StructType *ArgStackTy = llvm::StructType::get( >>>>> - CGF.getLLVMContext(), ArgTypes); >>>>> + llvm::StructType *ArgStackTy = llvm::StructType::get(Context, >>>>> ArgTypes); >>>>> >>>>> llvm::BasicBlock *EndBlock = CGF.createBasicBlock("setup.end"); >>>>> >>>>> @@ -120,6 +158,160 @@ void CGNVCUDARuntime::EmitDeviceStubBody >>>>> CGF.EmitBlock(EndBlock); >>>>> } >>>>> >>>>> +/// Creates internal function to register all kernel stubs generated >>>>> in this >>>>> +/// module with the CUDA runtime. >>>>> +/// \code >>>>> +/// void __cuda_register_kernels(void** GpuBinaryHandle) { >>>>> +/// __cudaRegisterFunction(GpuBinaryHandle,Kernel0,...); >>>>> +/// ... >>>>> +/// __cudaRegisterFunction(GpuBinaryHandle,KernelM,...); >>>>> +/// } >>>>> +/// \endcode >>>>> +llvm::Function *CGNVCUDARuntime::makeRegisterKernelsFn() { >>>>> + llvm::Function *RegisterKernelsFunc = llvm::Function::Create( >>>>> + llvm::FunctionType::get(VoidTy, VoidPtrPtrTy, false), >>>>> + llvm::GlobalValue::InternalLinkage, "__cuda_register_kernels", >>>>> &TheModule); >>>>> + llvm::BasicBlock *EntryBB = >>>>> + llvm::BasicBlock::Create(Context, "entry", RegisterKernelsFunc); >>>>> + CGBuilderTy Builder(Context); >>>>> + Builder.SetInsertPoint(EntryBB); >>>>> + >>>>> + // void __cudaRegisterFunction(void **, const char *, char *, const >>>>> char *, >>>>> + // int, uint3*, uint3*, dim3*, dim3*, >>>>> int*) >>>>> + std::vector<llvm::Type *> RegisterFuncParams = { >>>>> + VoidPtrPtrTy, CharPtrTy, CharPtrTy, CharPtrTy, IntTy, >>>>> + VoidPtrTy, VoidPtrTy, VoidPtrTy, VoidPtrTy, >>>>> IntTy->getPointerTo()}; >>>>> + llvm::Constant *RegisterFunc = CGM.CreateRuntimeFunction( >>>>> + llvm::FunctionType::get(IntTy, RegisterFuncParams, false), >>>>> + "__cudaRegisterFunction"); >>>>> + >>>>> + // Extract GpuBinaryHandle passed as the first argument passed to >>>>> + // __cuda_register_kernels() and generate __cudaRegisterFunction() >>>>> call for >>>>> + // each emitted kernel. >>>>> + llvm::Argument &GpuBinaryHandlePtr = >>>>> *RegisterKernelsFunc->arg_begin(); >>>>> + for (llvm::Function *Kernel : EmittedKernels) { >>>>> + llvm::Constant *KernelName = >>>>> makeConstantString(Kernel->getName()); >>>>> + llvm::Constant *NullPtr = >>>>> llvm::ConstantPointerNull::get(VoidPtrTy); >>>>> + llvm::Value *args[] = { >>>>> + &GpuBinaryHandlePtr, Builder.CreateBitCast(Kernel, VoidPtrTy), >>>>> + KernelName, KernelName, llvm::ConstantInt::get(IntTy, -1), >>>>> NullPtr, >>>>> + NullPtr, NullPtr, NullPtr, >>>>> + llvm::ConstantPointerNull::get(IntTy->getPointerTo())}; >>>>> + Builder.CreateCall(RegisterFunc, args); >>>>> + } >>>>> + >>>>> + Builder.CreateRetVoid(); >>>>> + return RegisterKernelsFunc; >>>>> +} >>>>> + >>>>> +/// Creates a global constructor function for the module: >>>>> +/// \code >>>>> +/// void __cuda_module_ctor(void*) { >>>>> +/// Handle0 = __cudaRegisterFatBinary(GpuBinaryBlob0); >>>>> +/// __cuda_register_kernels(Handle0); >>>>> +/// ... >>>>> +/// HandleN = __cudaRegisterFatBinary(GpuBinaryBlobN); >>>>> +/// __cuda_register_kernels(HandleN); >>>>> +/// } >>>>> +/// \endcode >>>>> +llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() { >>>>> + // void __cuda_register_kernels(void* handle); >>>>> + llvm::Function *RegisterKernelsFunc = makeRegisterKernelsFn(); >>>>> + // void ** __cudaRegisterFatBinary(void *); >>>>> + llvm::Constant *RegisterFatbinFunc = CGM.CreateRuntimeFunction( >>>>> + llvm::FunctionType::get(VoidPtrPtrTy, VoidPtrTy, false), >>>>> + "__cudaRegisterFatBinary"); >>>>> + // struct { int magic, int version, void * gpu_binary, void * >>>>> dont_care }; >>>>> + llvm::StructType *FatbinWrapperTy = >>>>> + llvm::StructType::get(IntTy, IntTy, VoidPtrTy, VoidPtrTy, >>>>> nullptr); >>>>> + >>>>> + llvm::Function *ModuleCtorFunc = llvm::Function::Create( >>>>> + llvm::FunctionType::get(VoidTy, VoidPtrTy, false), >>>>> + llvm::GlobalValue::InternalLinkage, "__cuda_module_ctor", >>>>> &TheModule); >>>>> + llvm::BasicBlock *CtorEntryBB = >>>>> + llvm::BasicBlock::Create(Context, "entry", ModuleCtorFunc); >>>>> + CGBuilderTy CtorBuilder(Context); >>>>> + >>>>> + CtorBuilder.SetInsertPoint(CtorEntryBB); >>>>> + >>>>> + // For each GPU binary, register it with the CUDA runtime and store >>>>> returned >>>>> + // handle in a global variable and save the handle in >>>>> GpuBinaryHandles vector >>>>> + // to be cleaned up in destructor on exit. Then associate all known >>>>> kernels >>>>> + // with the GPU binary handle so CUDA runtime can figure out what >>>>> to call on >>>>> + // the GPU side. >>>>> + for (const std::string &GpuBinaryFileName : >>>>> + CGM.getCodeGenOpts().CudaGpuBinaryFileNames) { >>>>> + llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> GpuBinaryOrErr >>>>> = >>>>> + llvm::MemoryBuffer::getFileOrSTDIN(GpuBinaryFileName); >>>>> + if (std::error_code EC = GpuBinaryOrErr.getError()) { >>>>> + CGM.getDiags().Report(diag::err_cannot_open_file) << >>>>> GpuBinaryFileName >>>>> + << >>>>> EC.message(); >>>>> + continue; >>>>> + } >>>>> + >>>>> + // Create initialized wrapper structure that points to the loaded >>>>> GPU binary >>>>> + llvm::Constant *Values[] = { >>>>> + llvm::ConstantInt::get(IntTy, 0x466243b1), // Fatbin wrapper >>>>> magic. >>>>> + llvm::ConstantInt::get(IntTy, 1), // Fatbin version. >>>>> + makeConstantString(GpuBinaryOrErr.get()->getBuffer(), "", >>>>> 16), // Data. >>>>> + llvm::ConstantPointerNull::get(VoidPtrTy)}; // Unused in >>>>> fatbin v1. >>>>> + llvm::GlobalVariable *FatbinWrapper = new llvm::GlobalVariable( >>>>> + TheModule, FatbinWrapperTy, true, >>>>> llvm::GlobalValue::InternalLinkage, >>>>> + llvm::ConstantStruct::get(FatbinWrapperTy, Values), >>>>> + "__cuda_fatbin_wrapper"); >>>>> + >>>>> + // GpuBinaryHandle = __cudaRegisterFatBinary(&FatbinWrapper); >>>>> + llvm::CallInst *RegisterFatbinCall = CtorBuilder.CreateCall( >>>>> + RegisterFatbinFunc, >>>>> + CtorBuilder.CreateBitCast(FatbinWrapper, VoidPtrTy)); >>>>> + llvm::GlobalVariable *GpuBinaryHandle = new llvm::GlobalVariable( >>>>> + TheModule, VoidPtrPtrTy, false, >>>>> llvm::GlobalValue::InternalLinkage, >>>>> + llvm::ConstantPointerNull::get(VoidPtrPtrTy), >>>>> "__cuda_gpubin_handle"); >>>>> + CtorBuilder.CreateStore(RegisterFatbinCall, GpuBinaryHandle, >>>>> false); >>>>> + >>>>> + // Call __cuda_register_kernels(GpuBinaryHandle); >>>>> + CtorBuilder.CreateCall(RegisterKernelsFunc, RegisterFatbinCall); >>>>> + >>>>> + // Save GpuBinaryHandle so we can unregister it in destructor. >>>>> + GpuBinaryHandles.push_back(GpuBinaryHandle); >>>>> + } >>>>> + >>>>> + CtorBuilder.CreateRetVoid(); >>>>> + return ModuleCtorFunc; >>>>> +} >>>>> + >>>>> +/// Creates a global destructor function that unregisters all GPU >>>>> code blobs >>>>> +/// registered by constructor. >>>>> +/// \code >>>>> +/// void __cuda_module_dtor(void*) { >>>>> +/// __cudaUnregisterFatBinary(Handle0); >>>>> +/// ... >>>>> +/// __cudaUnregisterFatBinary(HandleN); >>>>> +/// } >>>>> +/// \endcode >>>>> +llvm::Function *CGNVCUDARuntime::makeModuleDtorFunction() { >>>>> + // void __cudaUnregisterFatBinary(void ** handle); >>>>> + llvm::Constant *UnregisterFatbinFunc = CGM.CreateRuntimeFunction( >>>>> + llvm::FunctionType::get(VoidTy, VoidPtrPtrTy, false), >>>>> + "__cudaUnregisterFatBinary"); >>>>> + >>>>> + llvm::Function *ModuleDtorFunc = llvm::Function::Create( >>>>> + llvm::FunctionType::get(VoidTy, VoidPtrTy, false), >>>>> + llvm::GlobalValue::InternalLinkage, "__cuda_module_dtor", >>>>> &TheModule); >>>>> + llvm::BasicBlock *DtorEntryBB = >>>>> + llvm::BasicBlock::Create(Context, "entry", ModuleDtorFunc); >>>>> + CGBuilderTy DtorBuilder(Context); >>>>> + DtorBuilder.SetInsertPoint(DtorEntryBB); >>>>> + >>>>> + for (llvm::GlobalVariable *GpuBinaryHandle : GpuBinaryHandles) { >>>>> + DtorBuilder.CreateCall(UnregisterFatbinFunc, >>>>> + DtorBuilder.CreateLoad(GpuBinaryHandle, >>>>> false)); >>>>> + } >>>>> + >>>>> + DtorBuilder.CreateRetVoid(); >>>>> + return ModuleDtorFunc; >>>>> +} >>>>> + >>>>> CGCUDARuntime *CodeGen::CreateNVCUDARuntime(CodeGenModule &CGM) { >>>>> return new CGNVCUDARuntime(CGM); >>>>> } >>>>> >>>>> Modified: cfe/trunk/lib/CodeGen/CGCUDARuntime.h >>>>> URL: >>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CGCUDARuntime.h?rev=236765&r1=236764&r2=236765&view=diff >>>>> >>>>> ============================================================================== >>>>> --- cfe/trunk/lib/CodeGen/CGCUDARuntime.h (original) >>>>> +++ cfe/trunk/lib/CodeGen/CGCUDARuntime.h Thu May 7 14:34:16 2015 >>>>> @@ -16,6 +16,10 @@ >>>>> #ifndef LLVM_CLANG_LIB_CODEGEN_CGCUDARUNTIME_H >>>>> #define LLVM_CLANG_LIB_CODEGEN_CGCUDARUNTIME_H >>>>> >>>>> +namespace llvm { >>>>> +class Function; >>>>> +} >>>>> + >>>>> namespace clang { >>>>> >>>>> class CUDAKernelCallExpr; >>>>> @@ -39,10 +43,17 @@ public: >>>>> virtual RValue EmitCUDAKernelCallExpr(CodeGenFunction &CGF, >>>>> const CUDAKernelCallExpr *E, >>>>> ReturnValueSlot ReturnValue); >>>>> - >>>>> - virtual void EmitDeviceStubBody(CodeGenFunction &CGF, >>>>> - FunctionArgList &Args) = 0; >>>>> >>>>> + /// Emits a kernel launch stub. >>>>> + virtual void emitDeviceStub(CodeGenFunction &CGF, FunctionArgList >>>>> &Args) = 0; >>>>> + >>>>> + /// Constructs and returns a module initialization function or >>>>> nullptr if it's >>>>> + /// not needed. Must be called after all kernels have been emitted. >>>>> + virtual llvm::Function *makeModuleCtorFunction() = 0; >>>>> + >>>>> + /// Returns a module cleanup function or nullptr if it's not needed. >>>>> + /// Must be called after ModuleCtorFunction >>>>> + virtual llvm::Function *makeModuleDtorFunction() = 0; >>>>> }; >>>>> >>>>> /// Creates an instance of a CUDA runtime class. >>>>> >>>>> Modified: cfe/trunk/lib/CodeGen/CodeGenFunction.cpp >>>>> URL: >>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CodeGenFunction.cpp?rev=236765&r1=236764&r2=236765&view=diff >>>>> >>>>> ============================================================================== >>>>> --- cfe/trunk/lib/CodeGen/CodeGenFunction.cpp (original) >>>>> +++ cfe/trunk/lib/CodeGen/CodeGenFunction.cpp Thu May 7 14:34:16 2015 >>>>> @@ -878,7 +878,7 @@ void CodeGenFunction::GenerateCode(Globa >>>>> else if (getLangOpts().CUDA && >>>>> !getLangOpts().CUDAIsDevice && >>>>> FD->hasAttr<CUDAGlobalAttr>()) >>>>> - CGM.getCUDARuntime().EmitDeviceStubBody(*this, Args); >>>>> + CGM.getCUDARuntime().emitDeviceStub(*this, Args); >>>>> else if (isa<CXXConversionDecl>(FD) && >>>>> >>>>> cast<CXXConversionDecl>(FD)->isLambdaToBlockPointerConversion()) { >>>>> // The lambda conversion to block pointer is special; the >>>>> semantics can't be >>>>> >>>>> Modified: cfe/trunk/lib/CodeGen/CodeGenModule.cpp >>>>> URL: >>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CodeGenModule.cpp?rev=236765&r1=236764&r2=236765&view=diff >>>>> >>>>> ============================================================================== >>>>> --- cfe/trunk/lib/CodeGen/CodeGenModule.cpp (original) >>>>> +++ cfe/trunk/lib/CodeGen/CodeGenModule.cpp Thu May 7 14:34:16 2015 >>>>> @@ -350,6 +350,13 @@ void CodeGenModule::Release() { >>>>> if (ObjCRuntime) >>>>> if (llvm::Function *ObjCInitFunction = >>>>> ObjCRuntime->ModuleInitFunction()) >>>>> AddGlobalCtor(ObjCInitFunction); >>>>> + if (Context.getLangOpts().CUDA && >>>>> !Context.getLangOpts().CUDAIsDevice && >>>>> + CUDARuntime) { >>>>> + if (llvm::Function *CudaCtorFunction = >>>>> CUDARuntime->makeModuleCtorFunction()) >>>>> + AddGlobalCtor(CudaCtorFunction); >>>>> + if (llvm::Function *CudaDtorFunction = >>>>> CUDARuntime->makeModuleDtorFunction()) >>>>> + AddGlobalDtor(CudaDtorFunction); >>>>> + } >>>>> if (PGOReader && PGOStats.hasDiagnostics()) >>>>> PGOStats.reportDiagnostics(getDiags(), >>>>> getCodeGenOpts().MainFileName); >>>>> EmitCtorList(GlobalCtors, "llvm.global_ctors"); >>>>> @@ -3678,4 +3685,3 @@ void CodeGenModule::EmitOMPThreadPrivate >>>>> CXXGlobalInits.push_back(InitFunction); >>>>> } >>>>> } >>>>> - >>>>> >>>>> Modified: cfe/trunk/lib/Frontend/CompilerInvocation.cpp >>>>> URL: >>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Frontend/CompilerInvocation.cpp?rev=236765&r1=236764&r2=236765&view=diff >>>>> >>>>> ============================================================================== >>>>> --- cfe/trunk/lib/Frontend/CompilerInvocation.cpp (original) >>>>> +++ cfe/trunk/lib/Frontend/CompilerInvocation.cpp Thu May 7 14:34:16 >>>>> 2015 >>>>> @@ -651,6 +651,9 @@ static bool ParseCodeGenArgs(CodeGenOpti >>>>> Args.getAllArgValues(OPT_fsanitize_recover_EQ), >>>>> Diags, >>>>> Opts.SanitizeRecover); >>>>> >>>>> + Opts.CudaGpuBinaryFileNames = >>>>> + Args.getAllArgValues(OPT_fcuda_include_gpubinary); >>>>> + >>>>> return Success; >>>>> } >>>>> >>>>> >>>>> Modified: cfe/trunk/test/CodeGenCUDA/device-stub.cu >>>>> URL: >>>>> http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenCUDA/device-stub.cu?rev=236765&r1=236764&r2=236765&view=diff >>>>> >>>>> ============================================================================== >>>>> --- cfe/trunk/test/CodeGenCUDA/device-stub.cu (original) >>>>> +++ cfe/trunk/test/CodeGenCUDA/device-stub.cu Thu May 7 14:34:16 2015 >>>>> @@ -1,7 +1,21 @@ >>>>> -// RUN: %clang_cc1 -emit-llvm %s -o - | FileCheck %s >>>>> +// RUN: %clang_cc1 -emit-llvm %s -fcuda-include-gpubinary %s -o - | >>>>> FileCheck %s >>>>> >>>>> #include "Inputs/cuda.h" >>>>> >>>>> +// Make sure that all parts of GPU code init/cleanup are there: >>>>> +// * constant unnamed string with the kernel name >>>>> +// CHECK: private unnamed_addr constant{{.*}}kernelfunc{{.*}}\00", >>>>> align 1 >>>>> +// * constant unnamed string with GPU binary >>>>> +// CHECK: private unnamed_addr constant{{.*}}\00" >>>>> +// * constant struct that wraps GPU binary >>>>> +// CHECK: @__cuda_fatbin_wrapper = internal constant { i32, i32, i8*, >>>>> i8* } >>>>> +// CHECK: { i32 1180844977, i32 1, {{.*}}, i64 0, i64 0), i8* >>>>> null } >>>>> +// * variable to save GPU binary handle after initialization >>>>> +// CHECK: @__cuda_gpubin_handle = internal global i8** null >>>>> +// * Make sure our constructor/destructor was added to global >>>>> ctor/dtor list. >>>>> +// CHECK: @llvm.global_ctors = appending global >>>>> {{.*}}@__cuda_module_ctor >>>>> +// CHECK: @llvm.global_dtors = appending global >>>>> {{.*}}@__cuda_module_dtor >>>>> + >>>>> // Test that we build the correct number of calls to >>>>> cudaSetupArgument followed >>>>> // by a call to cudaLaunch. >>>>> >>>>> @@ -11,3 +25,28 @@ >>>>> // CHECK: call{{.*}}cudaSetupArgument >>>>> // CHECK: call{{.*}}cudaLaunch >>>>> __global__ void kernelfunc(int i, int j, int k) {} >>>>> + >>>>> +// Test that we've built correct kernel launch sequence. >>>>> +// CHECK: define{{.*}}hostfunc >>>>> +// CHECK: call{{.*}}cudaConfigureCall >>>>> +// CHEKC: call{{.*}}kernelfunc >>>>> +void hostfunc(void) { kernelfunc<<<1, 1>>>(1, 1, 1); } >>>>> + >>>>> +// Test that we've built a function to register kernels >>>>> +// CHECK: define internal void @__cuda_register_kernels >>>>> +// CHECK: call{{.*}}cudaRegisterFunction(i8** %0, {{.*}}kernelfunc >>>>> + >>>>> +// Test that we've built contructor.. >>>>> +// CHECK: define internal void @__cuda_module_ctor >>>>> +// .. that calls __cudaRegisterFatBinary(&__cuda_fatbin_wrapper) >>>>> +// CHECK: call{{.*}}cudaRegisterFatBinary{{.*}}__cuda_fatbin_wrapper >>>>> +// .. stores return value in __cuda_gpubin_handle >>>>> +// CHECK-NEXT: store{{.*}}__cuda_gpubin_handle >>>>> +// .. and then calls __cuda_register_kernels >>>>> +// CHECK-NEXT: call void @__cuda_register_kernels >>>>> + >>>>> +// Test that we've created destructor. >>>>> +// CHECK: define internal void @__cuda_module_dtor >>>>> +// CHECK: load{{.*}}__cuda_gpubin_handle >>>>> +// CHECK-NEXT: call void @__cudaUnregisterFatBinary >>>>> + >>>>> >>>>> >>>>> _______________________________________________ >>>>> cfe-commits mailing list >>>>> [email protected] >>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits >>>>> >>>> >>>> >>> >>> >>> -- >>> --Artem Belevich >>> >> >> >> >> -- >> --Artem Belevich >> > > > > -- > --Artem Belevich >
_______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
