Re: [PATCH] End-to-end CUDA compilation.

Artem Belevich Mon, 06 Apr 2015 11:09:28 -0700

================
Comment at: include/clang/Driver/Action.h:140
@@ +139,3 @@
+  virtual void anchor();
+  /// GPU architecture to bind
+  const char *GpuArchName;
----------------
eliben wrote:
> Can you give an example in this comment? like sm_30, etc.
Done


================
Comment at: include/clang/Driver/Action.h:144
@@ +143,3 @@
+public:
+  CudaDeviceAction(std::unique_ptr<Action> Input, const char *_ArchName);
+
----------------
eliben wrote:
> IIRC _[A-Z] names are discouraged, and against the style anyway
Done.

================
Comment at: include/clang/Driver/CC1Options.td:611
@@ -610,1 +610,3 @@
 
+def cuda_include_gpucode : Separate<["-"], "cuda-include-gpucode">,
+  HelpText<"Incorporate CUDA device-side code.">;
----------------
eliben wrote:
> I'm wondering about the "gpucode" mnemonic :-) It's unusual and kinda 
> ambiguous. What does gpucode mean here? PTX? Maybe PTX can be more explicit 
> then?
> 
> PTX is probably not too specific since this flag begins with "cuda_" so it's 
> already about the CUDA/PTX flow.
> 
> [this applies to other uses of "gpucode" too]
It's actually an opaque blob. clang does not care what's in the file as it just 
passes the bits to cudart which passes it to the driver. The driver can digest 
PTX (which we pass in this case), but it will as happily accept GPU code packed 
in fatbin or cubin formats. If/when we grow ability to compile device-side to 
SASS, we would just  do "-cuda-include-gpucode gpu-code-packed-in.cubin" and it 
should work with no other changes on the host side.

So, 'gpucode' was the best approximation I could come up with that would keep 
"GPU code in any shape or form as long as it's PTX/fatbin or cubin".

I'd be happy to change it. Suggestions?

================
Comment at: include/clang/Driver/Options.td:456
@@ -455,1 +455,3 @@
 def fcreate_profile : Flag<["-"], "fcreate-profile">, Group<f_Group>;
+def fcuda_no_device : Flag<["-"], "fcuda-no-device">,
+  HelpText<"Disable device-side CUDA compilation">;
----------------
eliben wrote:
> Is it possible to make these flags positive, with false-by-default values?
> 
> 
Sure. Changed the options to -fcuda-host-only/-fcuda-device-only




================
Comment at: include/clang/Driver/Options.td:1074
@@ +1073,3 @@
+  HelpText<"CUDA GPU architecture">;
+def gpu_architecture_EQ : Joined<["--"], "gpu-architecture=">,
+  Flags<[DriverOption]>, Alias<gpu_architecture>;
----------------
eliben wrote:
> What is this for?
I've added for (partial) compatibility with nvcc. I've removed it for now as 
drop-in nvcc compatibility is not the purpose of this patch.

================
Comment at: include/clang/Frontend/CodeGenOptions.h:164
@@ +163,3 @@
+  /// List of CUDA GPU code blobs to incorporate
+  std::vector<std::string> CudaGpuCodeFiles;
+
----------------
eliben wrote:
> s/Files/Blobs/ or "strings"? And as above, maybe PTX would be better than 
> "GpuCode"
It's a vector of strings containing names of files that contain GPU code blobs, 
whatever their format may be. I'll rename the variable to CudaGpuCodeFileNames 
and will update the comment to reflect that.

How about this?
  /// A list of file names passed with -cuda-include-gpucode options to forward
  /// to CUDA runtime back-end for incorporating them into host-side object
  /// file.
  std::vector<std::string> CudaGpuCodeFileNames;



================
Comment at: lib/CodeGen/CGCUDANV.cpp:47
@@ -37,1 +46,3 @@
   llvm::Constant *getLaunchFn() const;
+  llvm::Constant *getRegisterFunctionFn() const;
+
----------------
eliben wrote:
> Put doc comments for the new functions/methods you're adding, and preferably 
> for the data fields as well, unless they're completely obvious
Moved some fields out of the class and into local variables where they are used.
Documented the rest.

================
Comment at: lib/CodeGen/CGCUDANV.cpp:87
@@ +86,3 @@
+
+  Zeros[0] = llvm::ConstantInt::get(SizeTy, 0);
+  Zeros[1] = Zeros[0];
----------------
eliben wrote:
> Do you really need Zeros as a member? You only use it once. Also, if you just 
> declare it you can use the nice C++11 {...} initializer in the place of use, 
> making the code even shorter.
Done. Also moved number of other things with single use down to where they are 
used.


================
Comment at: lib/CodeGen/CGCUDANV.cpp:116
@@ -71,2 +115,3 @@
   std::vector<llvm::Type*> Params;
+  Params.push_back(VoidPtrPtrTy);
   Params.push_back(CharPtrTy);
----------------
eliben wrote:
> Use C++11 {...} initialization?
OK.

================
Comment at: lib/CodeGen/CGCUDANV.cpp:177
@@ +176,3 @@
+// void .cuda_register_kernels(void** GpuBlobHandle) {
+//  // for (Kernel : EmittedKernels) {
+//    __cudaRegisterFunction(GpuBlobHandle,Kernel)
----------------
eliben wrote:
> If this is pseudocode example, second level of // comments is superfluous
The idea I wanted to convey is that I'm not really generating the loop, but 
rather rather generate a call for each kernel, in effect unrolling the loop. 

I've changed pseudocode to linear sequence of calls which is what those 
functions really generate.

================
Comment at: lib/CodeGen/CGCUDANV.cpp:193
@@ +192,3 @@
+  llvm::Argument &BlobHandlePtr = *RegisterKernelsFunc->arg_begin();
+  for (llvm::Function *Kernel : EmittedKernels) {
+    llvm::Constant *KernelName = MakeConstantString(Kernel->getName());
----------------
eliben wrote:
> const?
Nope. CreateBitCast wants non-const Function*:

../../../tools/clang/lib/CodeGen/CGCUDANV.cpp:198:31: error: cannot initialize 
a parameter of type 'llvm::Value *' with an lvalue of type 'const 
llvm::Function *'
        Builder.CreateBitCast(Kernel, VoidPtrTy), // kernel stub addr


================
Comment at: lib/CodeGen/CGCUDANV.cpp:195
@@ +194,3 @@
+    llvm::Constant *KernelName = MakeConstantString(Kernel->getName());
+    llvm::Value *args[] = {
+        &BlobHandlePtr, // pointer to fatbin handler pointer
----------------
eliben wrote:
> Can you document the C signature of the called function somewhere for clarity?
I've moved CreateRuntimeFunction(...,"__cudaRegisterFunction") along with its 
signature in the comments into makeRegisterKernelsFn, so it should be visible 
close to where it's used.

================
Comment at: lib/CodeGen/CGCUDANV.cpp:197
@@ +196,3 @@
+        &BlobHandlePtr, // pointer to fatbin handler pointer
+        // Builder.CreatePointerCast(nullptr, CharPtrTy), // kernel stub addr
+        Builder.CreateBitCast(Kernel, VoidPtrTy), // kernel stub addr
----------------
eliben wrote:
> leftovers?
Yes. Removed.

================
Comment at: lib/CodeGen/CGCUDANV.cpp:219
@@ +218,3 @@
+// void .cuda_module_ctor(void*) {
+//   // for (GpuCodeBlob : GpuCodeBlobs) {
+//          Handle = __cudaRegisterFatBinary(GpuCodeBlob);
----------------
eliben wrote:
> same here re second-level //
Fixed.

================
Comment at: lib/CodeGen/CGCUDANV.cpp:251
@@ +250,3 @@
+    // Create initialized wrapper structure that points to the loaded GPU blob.
+    llvm::Constant *Values[4] = {
+        llvm::ConstantInt::get(IntTy, FatbinWrapperMagic),
----------------
eliben wrote:
> Is the 4 in [4] needed?
Not really. Removed.

================
Comment at: lib/CodeGen/CGCUDARuntime.h:42
@@ -34,1 +41,3 @@
 
+  llvm::SmallVector<llvm::Function *, 16> EmittedKernels;
+  llvm::SmallVector<llvm::GlobalVariable *, 16> FatbinHandles;
----------------
eliben wrote:
> It would really be great not to have data inside this abstract interface; is 
> this necessary?
> 
> Note that "fatbin handles" sounds very NVIDIA CUDA runtime specific, though 
> this interface is allegedly generic :)
List of generated kernels is something that I expect to be useful for all 
subclasses of CUDARuntime. 
That's why I've put EmittedKernels there and a non-virtual 
methodEmitDeviceStub() to populate it.

FatbinHandles, on the other hand, is indeed cudart-specific. I've moved it into 
CGCUDANV.

================
Comment at: lib/CodeGen/CGCUDARuntime.h:53
@@ +52,3 @@
+
+  virtual void EmitDeviceStub(CodeGenFunction &CGF, FunctionArgList &Args);
+
----------------
eliben wrote:
> Please document these APIs
Done.

http://reviews.llvm.org/D8463

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/



_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Re: [PATCH] End-to-end CUDA compilation.

Reply via email to