On Thursday, 18 May 2017 at 09:07:38 UTC, Nicholas Wilson wrote:
When ldc runs you will get a kernels_cudaxxx_yy.ptx (where xxx is the CUDA compute capability specified on the command line and yy is 32 or 64 for 32 or 64bit) which should fit somewhere into your existing C++ pipeline.

Whoops, that assumes you have a CUDA driver API pipeline in your C++ code, which if you're asking I'm not sure that you have. If you're using the `kernel<<<...>>>(args)` form to launch you kernels then you are going to have a lot more work to do in D because you'll need to use the driver API (http://docs.nvidia.com/cuda/cuda-driver-api/#axzz4hQLA0Zdm)
You'll need to:
*get a device
*create a context from it
*get a stream on that context
*load the ptx module (possibly linking it with other modules, to resolve missing symbols).
*compile it for the device
*then launch a kernel from that module on that device, by name passing the arguments in a void*[].

The sad thing is that its still nice than OpenCL because in OpenCL you have to pass the runtime args (with sizes) one by one to a function.

Hence why I want to automate as much of that shit as is possible.
I hope to have that done ASAP, but I don't have hardware set up to test CUDA at the moment (I have one on my windows box but I don't have dev set up there) and I'll be working on OpenCL at the same time (and theres only so much horrible API I can take in a day). I'll be working on dcompute part-part-time next semester though so I should be able to get a fair bit done and quite a few others are interested so that'll speed thing up a bit.

Reply via email to