It looks like on s390x strings come with 'align 2'. I'll remove align from the CHECK constraint and that should fix the test failure on s390x. I'll commit the fix shortly.
--Artem On Mon, May 11, 2015 at 10:43 AM, Artem Belevich <[email protected]> wrote: > Could you send me output of the CC1 executed by the test before it's piped > into FileCheck? > > /scratch/hstong/workdir/Release+Asserts/bin/clang -cc1 -internal-isystem > /scratch/hstong/workdir/Release+Asserts/bin/../lib/clang/3.7.0/include > -nostdsysteminc -emit-llvm /gsa/tlbgsa-h1/08/hstong/pub/ > cfe_trunk/clang/test/CodeGenCUDA/device-stub.cu -fcuda-include-gpubinary > /gsa/tlbgsa-h1/08/hstong/pub/cfe_trunk/clang/test/CodeGenCUDA/ > device-stub.cu -o - > > Oh, and I see a typo in the script -- "CHEKC: call{{.*}}kernelfunc", > though it's probably not what breaks the test in your case. > > --Artem > > On Sat, May 9, 2015 at 11:10 AM, Hubert Tong < > [email protected]> wrote: > >> Hi Artem, >> >> I am encountering a failure with device-stub.cu on s390x-suse-linux. Can >> you take a look? >> >> *Output:* >> FAIL: Clang :: CodeGenCUDA/device-stub.cu (1986 of 21893) >> ******************** TEST 'Clang :: CodeGenCUDA/device-stub.cu' FAILED >> ******************** >> Script: >> -- >> /scratch/hstong/workdir/Release+Asserts/bin/clang -cc1 -internal-isystem >> /scratch/hstong/workdir/Release+Asserts/bin/../lib/clang/3.7.0/include >> -nostdsysteminc -emit-llvm >> /gsa/tlbgsa-h1/08/hstong/pub/cfe_trunk/clang/test/CodeGenCUDA/ >> device-stub.cu -fcuda-include-gpubinary >> /gsa/tlbgsa-h1/08/hstong/pub/cfe_trunk/clang/test/CodeGenCUDA/ >> device-stub.cu -o - | >> /scratch/hstong/workdir/Release+Asserts/bin/FileCheck >> /gsa/tlbgsa-h1/08/hstong/pub/cfe_trunk/clang/test/CodeGenCUDA/ >> device-stub.cu >> -- >> Exit Code: 1 >> >> Command Output (stderr): >> -- >> >> /gsa/tlbgsa-h1/08/hstong/pub/cfe_trunk/clang/test/CodeGenCUDA/device-stub.cu:7:11: >> error: expected string not found in input >> // CHECK: private unnamed_addr constant{{.*}}kernelfunc{{.*}}\00", align 1 >> ^ >> <stdin>:1:1: note: scanning from here >> ; ModuleID = >> '/gsa/tlbgsa-h1/08/hstong/pub/cfe_trunk/clang/test/CodeGenCUDA/ >> device-stub.cu' >> ^ >> <stdin>:13:298: note: possible intended match here >> @1 = private unnamed_addr constant [2259 x i8] c"// RUN: %clang_cc1 >> -emit-llvm %s -fcuda-include-gpubinary %s -o - | FileCheck %s\0A\0A#include >> \22Inputs/cuda.h\22\0A\0A// Make sure that all parts of GPU code >> init/cleanup are there:\0A// * constant unnamed string with the kernel >> name\0A// CHECK: private unnamed_addr >> constant{{.*}}kernelfunc{{.*}}\5C00\22, align 1\0A// * constant unnamed >> string with GPU binary\0A// CHECK: private unnamed_addr >> constant{{.*}}\5C00\22\0A// * constant struct that wraps GPU binary\0A// >> CHECK: @__cuda_fatbin_wrapper = internal constant { i32, i32, i8*, i8* } >> \0A// CHECK: { i32 1180844977, i32 1, {{.*}}, i8* null }\0A// * variable to >> save GPU binary handle after initialization\0A// CHECK: >> @__cuda_gpubin_handle = internal global i8** null\0A// * Make sure our >> constructor/destructor was added to global ctor/dtor list.\0A// CHECK: >> @llvm.global_ctors = appending global {{.*}}@__cuda_module_ctor\0A// CHECK: >> @llvm.global_dtors = appending global {{.*}}@__cuda_module_dtor\0A\0A// >> Test that we build the correct number of calls to cudaSetupArgument >> followed\0A// by a call to cudaLaunch.\0A\0A// CHECK: >> define{{.*}}kernelfunc\0A// CHECK: call{{.*}}cudaSetupArgument\0A// CHECK: >> call{{.*}}cudaSetupArgument\0A// CHECK: call{{.*}}cudaSetupArgument\0A// >> CHECK: call{{.*}}cudaLaunch\0A__global__ void kernelfunc(int i, int j, int >> k) {}\0A\0A// Test that we've built correct kernel launch sequence.\0A// >> CHECK: define{{.*}}hostfunc\0A// CHECK: call{{.*}}cudaConfigureCall\0A// >> CHEKC: call{{.*}}kernelfunc\0Avoid hostfunc(void) { kernelfunc<<<1, 1>>>(1, >> 1, 1); }\0A\0A// Test that we've built a function to register kernels\0A// >> CHECK: define internal void @__cuda_register_kernels\0A// CHECK: >> call{{.*}}cudaRegisterFunction(i8** %0, {{.*}}kernelfunc\0A\0A// Test that >> we've built contructor..\0A// CHECK: define internal void >> @__cuda_module_ctor\0A// .. that calls >> __cudaRegisterFatBinary(&__cuda_fatbin_wrapper)\0A// CHECK: >> call{{.*}}cudaRegisterFatBinary{{.*}}__cuda_fatbin_wrapper\0A// .. stores >> return value in __cuda_gpubin_handle\0A// CHECK-NEXT: >> store{{.*}}__cuda_gpubin_handle\0A// .. and then calls >> __cuda_register_kernels\0A// CHECK-NEXT: call void >> @__cuda_register_kernels\0A\0A// Test that we've created destructor.\0A// >> CHECK: define internal void @__cuda_module_dtor\0A// CHECK: >> load{{.*}}__cuda_gpubin_handle\0A// CHECK-NEXT: call void >> @__cudaUnregisterFatBinary\0A\0A\00", align 2 >> >> ^ >> >> -- >> >> ******************** >> >> *Build environment info:* >> > g++ -v >> Using built-in specs. >> COLLECT_GCC=g++ >> COLLECT_LTO_WRAPPER=/usr/lib64/gcc/s390x-suse-linux/4.8/lto-wrapper >> Target: s390x-suse-linux >> Configured with: ../configure --prefix=/usr --infodir=/usr/share/info >> --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 >> --enable-languages=c,c++,objc,fortran,obj-c++,java >> --enable-checking=release --with-gxx-include-dir=/usr/include/c++/4.8 >> --enable-ssp --disable-libssp --disable-plugin --with-bugurl= >> http://bugs.opensuse.org/ --with-pkgversion='SUSE Linux' >> --disable-libgcj --disable-libmudflap --with-slibdir=/lib64 >> --with-system-zlib --enable-__cxa_atexit --enable-libstdcxx-allocator=new >> --disable-libstdcxx-pch --enable-version-specific-runtime-libs >> --enable-linker-build-id --enable-linux-futex --program-suffix=-4.8 >> --without-system-libunwind --with-tune=zEC12 --with-arch=z196 >> --with-long-double-128 --enable-decimal-float --build=s390x-suse-linux >> --host=s390x-suse-linux >> Thread model: posix >> gcc version 4.8.3 20140627 [gcc-4_8-branch revision 212064] (SUSE Linux) >> >> Thanks, >> >> >> Hubert Tong >> >> On Thu, May 7, 2015 at 2:34 PM, Artem Belevich <[email protected]> wrote: >> >>> Author: tra >>> Date: Thu May 7 14:34:16 2015 >>> New Revision: 236765 >>> >>> URL: http://llvm.org/viewvc/llvm-project?rev=236765&view=rev >>> Log: >>> [cuda] Include GPU binary into host object file and generate init/deinit >>> code. >>> >>> - added -fcuda-include-gpubinary option to incorporate results of >>> device-side compilation into host-side one. >>> - generate code to register GPU binaries and associated kernels >>> with CUDA runtime and clean-up on exit. >>> - added test case for init/deinit code generation. >>> >>> Differential Revision: http://reviews.llvm.org/D9507 >>> >>> Modified: >>> cfe/trunk/include/clang/Driver/CC1Options.td >>> cfe/trunk/include/clang/Frontend/CodeGenOptions.h >>> cfe/trunk/lib/CodeGen/CGCUDANV.cpp >>> cfe/trunk/lib/CodeGen/CGCUDARuntime.h >>> cfe/trunk/lib/CodeGen/CodeGenFunction.cpp >>> cfe/trunk/lib/CodeGen/CodeGenModule.cpp >>> cfe/trunk/lib/Frontend/CompilerInvocation.cpp >>> cfe/trunk/test/CodeGenCUDA/device-stub.cu >>> >>> Modified: cfe/trunk/include/clang/Driver/CC1Options.td >>> URL: >>> http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Driver/CC1Options.td?rev=236765&r1=236764&r2=236765&view=diff >>> >>> ============================================================================== >>> --- cfe/trunk/include/clang/Driver/CC1Options.td (original) >>> +++ cfe/trunk/include/clang/Driver/CC1Options.td Thu May 7 14:34:16 2015 >>> @@ -631,6 +631,8 @@ def fcuda_allow_host_calls_from_host_dev >>> def fcuda_disable_target_call_checks : Flag<["-"], >>> "fcuda-disable-target-call-checks">, >>> HelpText<"Disable all cross-target (host, device, etc.) call checks >>> in CUDA">; >>> +def fcuda_include_gpubinary : Separate<["-"], >>> "fcuda-include-gpubinary">, >>> + HelpText<"Incorporate CUDA device-side binary into host object >>> file.">; >>> >>> } // let Flags = [CC1Option] >>> >>> >>> Modified: cfe/trunk/include/clang/Frontend/CodeGenOptions.h >>> URL: >>> http://llvm.org/viewvc/llvm-project/cfe/trunk/include/clang/Frontend/CodeGenOptions.h?rev=236765&r1=236764&r2=236765&view=diff >>> >>> ============================================================================== >>> --- cfe/trunk/include/clang/Frontend/CodeGenOptions.h (original) >>> +++ cfe/trunk/include/clang/Frontend/CodeGenOptions.h Thu May 7 >>> 14:34:16 2015 >>> @@ -163,6 +163,11 @@ public: >>> /// Name of the profile file to use as input for -fprofile-instr-use >>> std::string InstrProfileInput; >>> >>> + /// A list of file names passed with -fcuda-include-gpubinary options >>> to >>> + /// forward to CUDA runtime back-end for incorporating them into >>> host-side >>> + /// object file. >>> + std::vector<std::string> CudaGpuBinaryFileNames; >>> + >>> /// Regular expression to select optimizations for which we should >>> enable >>> /// optimization remarks. Transformation passes whose name matches >>> this >>> /// expression (and support this feature), will emit a diagnostic >>> >>> Modified: cfe/trunk/lib/CodeGen/CGCUDANV.cpp >>> URL: >>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CGCUDANV.cpp?rev=236765&r1=236764&r2=236765&view=diff >>> >>> ============================================================================== >>> --- cfe/trunk/lib/CodeGen/CGCUDANV.cpp (original) >>> +++ cfe/trunk/lib/CodeGen/CGCUDANV.cpp Thu May 7 14:34:16 2015 >>> @@ -20,7 +20,6 @@ >>> #include "llvm/IR/CallSite.h" >>> #include "llvm/IR/Constants.h" >>> #include "llvm/IR/DerivedTypes.h" >>> -#include <vector> >>> >>> using namespace clang; >>> using namespace CodeGen; >>> @@ -30,29 +29,66 @@ namespace { >>> class CGNVCUDARuntime : public CGCUDARuntime { >>> >>> private: >>> - llvm::Type *IntTy, *SizeTy; >>> - llvm::PointerType *CharPtrTy, *VoidPtrTy; >>> + llvm::Type *IntTy, *SizeTy, *VoidTy; >>> + llvm::PointerType *CharPtrTy, *VoidPtrTy, *VoidPtrPtrTy; >>> + >>> + /// Convenience reference to LLVM Context >>> + llvm::LLVMContext &Context; >>> + /// Convenience reference to the current module >>> + llvm::Module &TheModule; >>> + /// Keeps track of kernel launch stubs emitted in this module >>> + llvm::SmallVector<llvm::Function *, 16> EmittedKernels; >>> + /// Keeps track of variables containing handles of GPU binaries. >>> Populated by >>> + /// ModuleCtorFunction() and used to create corresponding cleanup >>> calls in >>> + /// ModuleDtorFunction() >>> + llvm::SmallVector<llvm::GlobalVariable *, 16> GpuBinaryHandles; >>> >>> llvm::Constant *getSetupArgumentFn() const; >>> llvm::Constant *getLaunchFn() const; >>> >>> + /// Creates a function to register all kernel stubs generated in this >>> module. >>> + llvm::Function *makeRegisterKernelsFn(); >>> + >>> + /// Helper function that generates a constant string and returns a >>> pointer to >>> + /// the start of the string. The result of this function can be used >>> anywhere >>> + /// where the C code specifies const char*. >>> + llvm::Constant *makeConstantString(const std::string &Str, >>> + const std::string &Name = "", >>> + unsigned Alignment = 0) { >>> + llvm::Constant *Zeros[] = {llvm::ConstantInt::get(SizeTy, 0), >>> + llvm::ConstantInt::get(SizeTy, 0)}; >>> + auto *ConstStr = CGM.GetAddrOfConstantCString(Str, Name.c_str()); >>> + return >>> llvm::ConstantExpr::getGetElementPtr(ConstStr->getValueType(), >>> + ConstStr, Zeros); >>> + } >>> + >>> + void emitDeviceStubBody(CodeGenFunction &CGF, FunctionArgList &Args); >>> + >>> public: >>> CGNVCUDARuntime(CodeGenModule &CGM); >>> >>> - void EmitDeviceStubBody(CodeGenFunction &CGF, FunctionArgList &Args) >>> override; >>> + void emitDeviceStub(CodeGenFunction &CGF, FunctionArgList &Args) >>> override; >>> + /// Creates module constructor function >>> + llvm::Function *makeModuleCtorFunction() override; >>> + /// Creates module destructor function >>> + llvm::Function *makeModuleDtorFunction() override; >>> }; >>> >>> } >>> >>> -CGNVCUDARuntime::CGNVCUDARuntime(CodeGenModule &CGM) : >>> CGCUDARuntime(CGM) { >>> +CGNVCUDARuntime::CGNVCUDARuntime(CodeGenModule &CGM) >>> + : CGCUDARuntime(CGM), Context(CGM.getLLVMContext()), >>> + TheModule(CGM.getModule()) { >>> CodeGen::CodeGenTypes &Types = CGM.getTypes(); >>> ASTContext &Ctx = CGM.getContext(); >>> >>> IntTy = Types.ConvertType(Ctx.IntTy); >>> SizeTy = Types.ConvertType(Ctx.getSizeType()); >>> + VoidTy = llvm::Type::getVoidTy(Context); >>> >>> CharPtrTy = >>> llvm::PointerType::getUnqual(Types.ConvertType(Ctx.CharTy)); >>> VoidPtrTy = cast<llvm::PointerType>(Types.ConvertType(Ctx.VoidPtrTy)); >>> + VoidPtrPtrTy = VoidPtrTy->getPointerTo(); >>> } >>> >>> llvm::Constant *CGNVCUDARuntime::getSetupArgumentFn() const { >>> @@ -68,14 +104,17 @@ llvm::Constant *CGNVCUDARuntime::getSetu >>> >>> llvm::Constant *CGNVCUDARuntime::getLaunchFn() const { >>> // cudaError_t cudaLaunch(char *) >>> - std::vector<llvm::Type*> Params; >>> - Params.push_back(CharPtrTy); >>> - return CGM.CreateRuntimeFunction(llvm::FunctionType::get(IntTy, >>> - Params, >>> false), >>> - "cudaLaunch"); >>> + return CGM.CreateRuntimeFunction( >>> + llvm::FunctionType::get(IntTy, CharPtrTy, false), "cudaLaunch"); >>> +} >>> + >>> +void CGNVCUDARuntime::emitDeviceStub(CodeGenFunction &CGF, >>> + FunctionArgList &Args) { >>> + EmittedKernels.push_back(CGF.CurFn); >>> + emitDeviceStubBody(CGF, Args); >>> } >>> >>> -void CGNVCUDARuntime::EmitDeviceStubBody(CodeGenFunction &CGF, >>> +void CGNVCUDARuntime::emitDeviceStubBody(CodeGenFunction &CGF, >>> FunctionArgList &Args) { >>> // Build the argument value list and the argument stack struct type. >>> SmallVector<llvm::Value *, 16> ArgValues; >>> @@ -87,8 +126,7 @@ void CGNVCUDARuntime::EmitDeviceStubBody >>> assert(isa<llvm::PointerType>(V->getType()) && "Arg type not >>> PointerType"); >>> >>> >>> ArgTypes.push_back(cast<llvm::PointerType>(V->getType())->getElementType()); >>> } >>> - llvm::StructType *ArgStackTy = llvm::StructType::get( >>> - CGF.getLLVMContext(), ArgTypes); >>> + llvm::StructType *ArgStackTy = llvm::StructType::get(Context, >>> ArgTypes); >>> >>> llvm::BasicBlock *EndBlock = CGF.createBasicBlock("setup.end"); >>> >>> @@ -120,6 +158,160 @@ void CGNVCUDARuntime::EmitDeviceStubBody >>> CGF.EmitBlock(EndBlock); >>> } >>> >>> +/// Creates internal function to register all kernel stubs generated in >>> this >>> +/// module with the CUDA runtime. >>> +/// \code >>> +/// void __cuda_register_kernels(void** GpuBinaryHandle) { >>> +/// __cudaRegisterFunction(GpuBinaryHandle,Kernel0,...); >>> +/// ... >>> +/// __cudaRegisterFunction(GpuBinaryHandle,KernelM,...); >>> +/// } >>> +/// \endcode >>> +llvm::Function *CGNVCUDARuntime::makeRegisterKernelsFn() { >>> + llvm::Function *RegisterKernelsFunc = llvm::Function::Create( >>> + llvm::FunctionType::get(VoidTy, VoidPtrPtrTy, false), >>> + llvm::GlobalValue::InternalLinkage, "__cuda_register_kernels", >>> &TheModule); >>> + llvm::BasicBlock *EntryBB = >>> + llvm::BasicBlock::Create(Context, "entry", RegisterKernelsFunc); >>> + CGBuilderTy Builder(Context); >>> + Builder.SetInsertPoint(EntryBB); >>> + >>> + // void __cudaRegisterFunction(void **, const char *, char *, const >>> char *, >>> + // int, uint3*, uint3*, dim3*, dim3*, >>> int*) >>> + std::vector<llvm::Type *> RegisterFuncParams = { >>> + VoidPtrPtrTy, CharPtrTy, CharPtrTy, CharPtrTy, IntTy, >>> + VoidPtrTy, VoidPtrTy, VoidPtrTy, VoidPtrTy, >>> IntTy->getPointerTo()}; >>> + llvm::Constant *RegisterFunc = CGM.CreateRuntimeFunction( >>> + llvm::FunctionType::get(IntTy, RegisterFuncParams, false), >>> + "__cudaRegisterFunction"); >>> + >>> + // Extract GpuBinaryHandle passed as the first argument passed to >>> + // __cuda_register_kernels() and generate __cudaRegisterFunction() >>> call for >>> + // each emitted kernel. >>> + llvm::Argument &GpuBinaryHandlePtr = >>> *RegisterKernelsFunc->arg_begin(); >>> + for (llvm::Function *Kernel : EmittedKernels) { >>> + llvm::Constant *KernelName = makeConstantString(Kernel->getName()); >>> + llvm::Constant *NullPtr = llvm::ConstantPointerNull::get(VoidPtrTy); >>> + llvm::Value *args[] = { >>> + &GpuBinaryHandlePtr, Builder.CreateBitCast(Kernel, VoidPtrTy), >>> + KernelName, KernelName, llvm::ConstantInt::get(IntTy, -1), >>> NullPtr, >>> + NullPtr, NullPtr, NullPtr, >>> + llvm::ConstantPointerNull::get(IntTy->getPointerTo())}; >>> + Builder.CreateCall(RegisterFunc, args); >>> + } >>> + >>> + Builder.CreateRetVoid(); >>> + return RegisterKernelsFunc; >>> +} >>> + >>> +/// Creates a global constructor function for the module: >>> +/// \code >>> +/// void __cuda_module_ctor(void*) { >>> +/// Handle0 = __cudaRegisterFatBinary(GpuBinaryBlob0); >>> +/// __cuda_register_kernels(Handle0); >>> +/// ... >>> +/// HandleN = __cudaRegisterFatBinary(GpuBinaryBlobN); >>> +/// __cuda_register_kernels(HandleN); >>> +/// } >>> +/// \endcode >>> +llvm::Function *CGNVCUDARuntime::makeModuleCtorFunction() { >>> + // void __cuda_register_kernels(void* handle); >>> + llvm::Function *RegisterKernelsFunc = makeRegisterKernelsFn(); >>> + // void ** __cudaRegisterFatBinary(void *); >>> + llvm::Constant *RegisterFatbinFunc = CGM.CreateRuntimeFunction( >>> + llvm::FunctionType::get(VoidPtrPtrTy, VoidPtrTy, false), >>> + "__cudaRegisterFatBinary"); >>> + // struct { int magic, int version, void * gpu_binary, void * >>> dont_care }; >>> + llvm::StructType *FatbinWrapperTy = >>> + llvm::StructType::get(IntTy, IntTy, VoidPtrTy, VoidPtrTy, >>> nullptr); >>> + >>> + llvm::Function *ModuleCtorFunc = llvm::Function::Create( >>> + llvm::FunctionType::get(VoidTy, VoidPtrTy, false), >>> + llvm::GlobalValue::InternalLinkage, "__cuda_module_ctor", >>> &TheModule); >>> + llvm::BasicBlock *CtorEntryBB = >>> + llvm::BasicBlock::Create(Context, "entry", ModuleCtorFunc); >>> + CGBuilderTy CtorBuilder(Context); >>> + >>> + CtorBuilder.SetInsertPoint(CtorEntryBB); >>> + >>> + // For each GPU binary, register it with the CUDA runtime and store >>> returned >>> + // handle in a global variable and save the handle in >>> GpuBinaryHandles vector >>> + // to be cleaned up in destructor on exit. Then associate all known >>> kernels >>> + // with the GPU binary handle so CUDA runtime can figure out what to >>> call on >>> + // the GPU side. >>> + for (const std::string &GpuBinaryFileName : >>> + CGM.getCodeGenOpts().CudaGpuBinaryFileNames) { >>> + llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> GpuBinaryOrErr = >>> + llvm::MemoryBuffer::getFileOrSTDIN(GpuBinaryFileName); >>> + if (std::error_code EC = GpuBinaryOrErr.getError()) { >>> + CGM.getDiags().Report(diag::err_cannot_open_file) << >>> GpuBinaryFileName >>> + << EC.message(); >>> + continue; >>> + } >>> + >>> + // Create initialized wrapper structure that points to the loaded >>> GPU binary >>> + llvm::Constant *Values[] = { >>> + llvm::ConstantInt::get(IntTy, 0x466243b1), // Fatbin wrapper >>> magic. >>> + llvm::ConstantInt::get(IntTy, 1), // Fatbin version. >>> + makeConstantString(GpuBinaryOrErr.get()->getBuffer(), "", 16), >>> // Data. >>> + llvm::ConstantPointerNull::get(VoidPtrTy)}; // Unused in fatbin >>> v1. >>> + llvm::GlobalVariable *FatbinWrapper = new llvm::GlobalVariable( >>> + TheModule, FatbinWrapperTy, true, >>> llvm::GlobalValue::InternalLinkage, >>> + llvm::ConstantStruct::get(FatbinWrapperTy, Values), >>> + "__cuda_fatbin_wrapper"); >>> + >>> + // GpuBinaryHandle = __cudaRegisterFatBinary(&FatbinWrapper); >>> + llvm::CallInst *RegisterFatbinCall = CtorBuilder.CreateCall( >>> + RegisterFatbinFunc, >>> + CtorBuilder.CreateBitCast(FatbinWrapper, VoidPtrTy)); >>> + llvm::GlobalVariable *GpuBinaryHandle = new llvm::GlobalVariable( >>> + TheModule, VoidPtrPtrTy, false, >>> llvm::GlobalValue::InternalLinkage, >>> + llvm::ConstantPointerNull::get(VoidPtrPtrTy), >>> "__cuda_gpubin_handle"); >>> + CtorBuilder.CreateStore(RegisterFatbinCall, GpuBinaryHandle, false); >>> + >>> + // Call __cuda_register_kernels(GpuBinaryHandle); >>> + CtorBuilder.CreateCall(RegisterKernelsFunc, RegisterFatbinCall); >>> + >>> + // Save GpuBinaryHandle so we can unregister it in destructor. >>> + GpuBinaryHandles.push_back(GpuBinaryHandle); >>> + } >>> + >>> + CtorBuilder.CreateRetVoid(); >>> + return ModuleCtorFunc; >>> +} >>> + >>> +/// Creates a global destructor function that unregisters all GPU code >>> blobs >>> +/// registered by constructor. >>> +/// \code >>> +/// void __cuda_module_dtor(void*) { >>> +/// __cudaUnregisterFatBinary(Handle0); >>> +/// ... >>> +/// __cudaUnregisterFatBinary(HandleN); >>> +/// } >>> +/// \endcode >>> +llvm::Function *CGNVCUDARuntime::makeModuleDtorFunction() { >>> + // void __cudaUnregisterFatBinary(void ** handle); >>> + llvm::Constant *UnregisterFatbinFunc = CGM.CreateRuntimeFunction( >>> + llvm::FunctionType::get(VoidTy, VoidPtrPtrTy, false), >>> + "__cudaUnregisterFatBinary"); >>> + >>> + llvm::Function *ModuleDtorFunc = llvm::Function::Create( >>> + llvm::FunctionType::get(VoidTy, VoidPtrTy, false), >>> + llvm::GlobalValue::InternalLinkage, "__cuda_module_dtor", >>> &TheModule); >>> + llvm::BasicBlock *DtorEntryBB = >>> + llvm::BasicBlock::Create(Context, "entry", ModuleDtorFunc); >>> + CGBuilderTy DtorBuilder(Context); >>> + DtorBuilder.SetInsertPoint(DtorEntryBB); >>> + >>> + for (llvm::GlobalVariable *GpuBinaryHandle : GpuBinaryHandles) { >>> + DtorBuilder.CreateCall(UnregisterFatbinFunc, >>> + DtorBuilder.CreateLoad(GpuBinaryHandle, >>> false)); >>> + } >>> + >>> + DtorBuilder.CreateRetVoid(); >>> + return ModuleDtorFunc; >>> +} >>> + >>> CGCUDARuntime *CodeGen::CreateNVCUDARuntime(CodeGenModule &CGM) { >>> return new CGNVCUDARuntime(CGM); >>> } >>> >>> Modified: cfe/trunk/lib/CodeGen/CGCUDARuntime.h >>> URL: >>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CGCUDARuntime.h?rev=236765&r1=236764&r2=236765&view=diff >>> >>> ============================================================================== >>> --- cfe/trunk/lib/CodeGen/CGCUDARuntime.h (original) >>> +++ cfe/trunk/lib/CodeGen/CGCUDARuntime.h Thu May 7 14:34:16 2015 >>> @@ -16,6 +16,10 @@ >>> #ifndef LLVM_CLANG_LIB_CODEGEN_CGCUDARUNTIME_H >>> #define LLVM_CLANG_LIB_CODEGEN_CGCUDARUNTIME_H >>> >>> +namespace llvm { >>> +class Function; >>> +} >>> + >>> namespace clang { >>> >>> class CUDAKernelCallExpr; >>> @@ -39,10 +43,17 @@ public: >>> virtual RValue EmitCUDAKernelCallExpr(CodeGenFunction &CGF, >>> const CUDAKernelCallExpr *E, >>> ReturnValueSlot ReturnValue); >>> - >>> - virtual void EmitDeviceStubBody(CodeGenFunction &CGF, >>> - FunctionArgList &Args) = 0; >>> >>> + /// Emits a kernel launch stub. >>> + virtual void emitDeviceStub(CodeGenFunction &CGF, FunctionArgList >>> &Args) = 0; >>> + >>> + /// Constructs and returns a module initialization function or >>> nullptr if it's >>> + /// not needed. Must be called after all kernels have been emitted. >>> + virtual llvm::Function *makeModuleCtorFunction() = 0; >>> + >>> + /// Returns a module cleanup function or nullptr if it's not needed. >>> + /// Must be called after ModuleCtorFunction >>> + virtual llvm::Function *makeModuleDtorFunction() = 0; >>> }; >>> >>> /// Creates an instance of a CUDA runtime class. >>> >>> Modified: cfe/trunk/lib/CodeGen/CodeGenFunction.cpp >>> URL: >>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CodeGenFunction.cpp?rev=236765&r1=236764&r2=236765&view=diff >>> >>> ============================================================================== >>> --- cfe/trunk/lib/CodeGen/CodeGenFunction.cpp (original) >>> +++ cfe/trunk/lib/CodeGen/CodeGenFunction.cpp Thu May 7 14:34:16 2015 >>> @@ -878,7 +878,7 @@ void CodeGenFunction::GenerateCode(Globa >>> else if (getLangOpts().CUDA && >>> !getLangOpts().CUDAIsDevice && >>> FD->hasAttr<CUDAGlobalAttr>()) >>> - CGM.getCUDARuntime().EmitDeviceStubBody(*this, Args); >>> + CGM.getCUDARuntime().emitDeviceStub(*this, Args); >>> else if (isa<CXXConversionDecl>(FD) && >>> >>> cast<CXXConversionDecl>(FD)->isLambdaToBlockPointerConversion()) { >>> // The lambda conversion to block pointer is special; the semantics >>> can't be >>> >>> Modified: cfe/trunk/lib/CodeGen/CodeGenModule.cpp >>> URL: >>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/CodeGen/CodeGenModule.cpp?rev=236765&r1=236764&r2=236765&view=diff >>> >>> ============================================================================== >>> --- cfe/trunk/lib/CodeGen/CodeGenModule.cpp (original) >>> +++ cfe/trunk/lib/CodeGen/CodeGenModule.cpp Thu May 7 14:34:16 2015 >>> @@ -350,6 +350,13 @@ void CodeGenModule::Release() { >>> if (ObjCRuntime) >>> if (llvm::Function *ObjCInitFunction = >>> ObjCRuntime->ModuleInitFunction()) >>> AddGlobalCtor(ObjCInitFunction); >>> + if (Context.getLangOpts().CUDA && !Context.getLangOpts().CUDAIsDevice >>> && >>> + CUDARuntime) { >>> + if (llvm::Function *CudaCtorFunction = >>> CUDARuntime->makeModuleCtorFunction()) >>> + AddGlobalCtor(CudaCtorFunction); >>> + if (llvm::Function *CudaDtorFunction = >>> CUDARuntime->makeModuleDtorFunction()) >>> + AddGlobalDtor(CudaDtorFunction); >>> + } >>> if (PGOReader && PGOStats.hasDiagnostics()) >>> PGOStats.reportDiagnostics(getDiags(), >>> getCodeGenOpts().MainFileName); >>> EmitCtorList(GlobalCtors, "llvm.global_ctors"); >>> @@ -3678,4 +3685,3 @@ void CodeGenModule::EmitOMPThreadPrivate >>> CXXGlobalInits.push_back(InitFunction); >>> } >>> } >>> - >>> >>> Modified: cfe/trunk/lib/Frontend/CompilerInvocation.cpp >>> URL: >>> http://llvm.org/viewvc/llvm-project/cfe/trunk/lib/Frontend/CompilerInvocation.cpp?rev=236765&r1=236764&r2=236765&view=diff >>> >>> ============================================================================== >>> --- cfe/trunk/lib/Frontend/CompilerInvocation.cpp (original) >>> +++ cfe/trunk/lib/Frontend/CompilerInvocation.cpp Thu May 7 14:34:16 >>> 2015 >>> @@ -651,6 +651,9 @@ static bool ParseCodeGenArgs(CodeGenOpti >>> Args.getAllArgValues(OPT_fsanitize_recover_EQ), >>> Diags, >>> Opts.SanitizeRecover); >>> >>> + Opts.CudaGpuBinaryFileNames = >>> + Args.getAllArgValues(OPT_fcuda_include_gpubinary); >>> + >>> return Success; >>> } >>> >>> >>> Modified: cfe/trunk/test/CodeGenCUDA/device-stub.cu >>> URL: >>> http://llvm.org/viewvc/llvm-project/cfe/trunk/test/CodeGenCUDA/device-stub.cu?rev=236765&r1=236764&r2=236765&view=diff >>> >>> ============================================================================== >>> --- cfe/trunk/test/CodeGenCUDA/device-stub.cu (original) >>> +++ cfe/trunk/test/CodeGenCUDA/device-stub.cu Thu May 7 14:34:16 2015 >>> @@ -1,7 +1,21 @@ >>> -// RUN: %clang_cc1 -emit-llvm %s -o - | FileCheck %s >>> +// RUN: %clang_cc1 -emit-llvm %s -fcuda-include-gpubinary %s -o - | >>> FileCheck %s >>> >>> #include "Inputs/cuda.h" >>> >>> +// Make sure that all parts of GPU code init/cleanup are there: >>> +// * constant unnamed string with the kernel name >>> +// CHECK: private unnamed_addr constant{{.*}}kernelfunc{{.*}}\00", >>> align 1 >>> +// * constant unnamed string with GPU binary >>> +// CHECK: private unnamed_addr constant{{.*}}\00" >>> +// * constant struct that wraps GPU binary >>> +// CHECK: @__cuda_fatbin_wrapper = internal constant { i32, i32, i8*, >>> i8* } >>> +// CHECK: { i32 1180844977, i32 1, {{.*}}, i64 0, i64 0), i8* >>> null } >>> +// * variable to save GPU binary handle after initialization >>> +// CHECK: @__cuda_gpubin_handle = internal global i8** null >>> +// * Make sure our constructor/destructor was added to global ctor/dtor >>> list. >>> +// CHECK: @llvm.global_ctors = appending global >>> {{.*}}@__cuda_module_ctor >>> +// CHECK: @llvm.global_dtors = appending global >>> {{.*}}@__cuda_module_dtor >>> + >>> // Test that we build the correct number of calls to cudaSetupArgument >>> followed >>> // by a call to cudaLaunch. >>> >>> @@ -11,3 +25,28 @@ >>> // CHECK: call{{.*}}cudaSetupArgument >>> // CHECK: call{{.*}}cudaLaunch >>> __global__ void kernelfunc(int i, int j, int k) {} >>> + >>> +// Test that we've built correct kernel launch sequence. >>> +// CHECK: define{{.*}}hostfunc >>> +// CHECK: call{{.*}}cudaConfigureCall >>> +// CHEKC: call{{.*}}kernelfunc >>> +void hostfunc(void) { kernelfunc<<<1, 1>>>(1, 1, 1); } >>> + >>> +// Test that we've built a function to register kernels >>> +// CHECK: define internal void @__cuda_register_kernels >>> +// CHECK: call{{.*}}cudaRegisterFunction(i8** %0, {{.*}}kernelfunc >>> + >>> +// Test that we've built contructor.. >>> +// CHECK: define internal void @__cuda_module_ctor >>> +// .. that calls __cudaRegisterFatBinary(&__cuda_fatbin_wrapper) >>> +// CHECK: call{{.*}}cudaRegisterFatBinary{{.*}}__cuda_fatbin_wrapper >>> +// .. stores return value in __cuda_gpubin_handle >>> +// CHECK-NEXT: store{{.*}}__cuda_gpubin_handle >>> +// .. and then calls __cuda_register_kernels >>> +// CHECK-NEXT: call void @__cuda_register_kernels >>> + >>> +// Test that we've created destructor. >>> +// CHECK: define internal void @__cuda_module_dtor >>> +// CHECK: load{{.*}}__cuda_gpubin_handle >>> +// CHECK-NEXT: call void @__cudaUnregisterFatBinary >>> + >>> >>> >>> _______________________________________________ >>> cfe-commits mailing list >>> [email protected] >>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits >>> >> >> > > > -- > --Artem Belevich > -- --Artem Belevich
_______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
