[PATCH] D38134: [OpenCL] Emit enqueued block as kernel

Yaxun Liu via Phabricator via cfe-commits Fri, 22 Sep 2017 10:59:05 -0700

yaxunl marked 10 inline comments as done.
yaxunl added inline comments.


================
Comment at: lib/CodeGen/CGOpenCLRuntime.cpp:113
+
+llvm::Value *CGOpenCLRuntime::emitOpenCLEnqueuedBlock(CodeGenFunction &CGF,
+                                                      const Expr *E) {
----------------
Anastasia wrote:
> yaxunl wrote:
> > Anastasia wrote:
> > > I am not particularly in favour of duplicating CodeGen functionality as 
> > > it typically has so many special cases that are hard to catch. Is this 
> > > logic needed in order to pass to block literal information  that the 
> > > block is enqueued?
> > This code is needed to emit separate functions for a block which is 
> > directly called and also enqueued as a kernel. Since the kernel needs to 
> > have proper calling convention and ABI, it cannot be emitted as the same 
> > function as when the block is called directly. Since it is OpenCL specific 
> > code, I found it is cleaner to separate this code as member of 
> > CGOpenCLRuntime instead of fitting it into CGF.EmitBlockLiteral.
> This part is replacing standard `EmitScalarExpr` call which is doing several 
> things before calling into block generation. That's why I am a bit worried we 
> are covering all the corner cases here.
> 
> So if we transform all blocks into kernels unconditionally we won't need this 
> special handling then?
> 
> Do we generate two versions of the blocks now: one for enqueue and one for 
> call?
If we transform all blocks into kernels, we could simplify the logic. Probably 
will not need this special handling.

However, when the block is called directly, the first argument is a private 
pointer, when it is executed as a kernel, the first argument is a global 
pointer or a struct (for amdgpu target), therefore the emitted functions cannot 
be the same.


================
Comment at: lib/CodeGen/CodeGenFunction.cpp:535
+    if (i == 0 && IsBlock) {
+      ty = CGF.CGM.getTargetCodeGenInfo().getEnqueuedBlockArgumentType(
+          ASTCtx, *CGF.BlockInfo);
----------------
Anastasia wrote:
> yaxunl wrote:
> > Anastasia wrote:
> > > I don't quite understand why we need to special case this? As far as I 
> > > undertsnad block argument is a `generic void* ` type but it's being cast 
> > > to a concrete block struct inside the block function. Do we gain anything 
> > > from having it a specific type here?
> > This argument is not part of BlockDecl. BlockDecl only has arguments shown 
> > up in the source code. The first argument in the LLVM block invoke function 
> > is generated by codegen and there is no correspondence in AST, so it has to 
> > be handled as a special case.
> Considering that enqueued kernel always takes the same type of the arguments 
> (`local void*`) and # of args is specified in `enqueue_kernel`, I was 
> wondering whether we need to generate the information on the kernel 
> parameters at all? The enqueueing side will have the information provided in 
> the `enqueue_kernel` code.
> 
> As for the block itself it can be passed as `generic void*` and then cast to 
> the block struct type inside the block itself.
amdgpu backend relies on kernel argument metadata to generate some metadata in 
elf for runtime to launch the kernel. The backend expects the kernel argument 
metadata on each kernel argument. Not generating kernel metadata on the first 
kernel argument requires special handling in the backend. I think it is better 
to let Clang generate kernel argument metadata for all kernel arguments.


================
Comment at: lib/CodeGen/CodeGenFunction.cpp:667
                   llvm::MDNode::get(Context, argTypeQuals));
-  if (CGM.getCodeGenOpts().EmitOpenCLArgMetadata)
+  if (CGF.CGM.getCodeGenOpts().EmitOpenCLArgMetadata)
     Fn->setMetadata("kernel_arg_name",
----------------
Anastasia wrote:
> Why this change?
CGM is no longer a function parameter since now this function requires a CGF 
parameter.


================
Comment at: test/CodeGenOpenCL/amdgpu-enqueue-kernel.cl:3
+
+// CHECK: %[[S1:struct.__amdgpu_block_arg_t.*]] = type { [3 x i64], [1 x i8] }
+// CHECK: %[[S2:struct.__amdgpu_block_arg_t.*]] = type { [5 x i64], [1 x i8] }
----------------
Anastasia wrote:
> This struct is not identical to block literal struct?
The LLVM type of the first argument of block invoke function is created 
directly with sorting and rearrangement. There is no AST type corresponding to 
it. However, the function codegen requires AST type of this argument. I feel it 
is unnecessary to create the corresponding AST type. For simplicity, just 
create an AST type with the same size and alignment as the LLVM type. In the 
function code, it will be bitcasted to the correct LLVM struct type and get the 
captured variables.


https://reviews.llvm.org/D38134



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D38134: [OpenCL] Emit enqueued block as kernel

Reply via email to