I'm pleased to announce that I have run the first dcompute kernel
and it was a success!
There is still a fair bit of polish to the driver needed to make
the API sane and more complete, not to mention more similar to
the (untested) OpenCL driver API. But it works!
(Contributions are of course greatly welcomed)
@kernel void saxpy(GlobalPointer!(float) res,
float alpha,GlobalPointer!(float) x,
auto i = GlobalIndex.x;
if (i >= N) return;
res[i] = alpha*x[i] + y[i];
The host code:
import dcompute.tests.dummykernels : saxpy;
auto devs = Platform.getDevices(theAllocator);
auto ctx = Context(devs); scope(exit) ctx.detach();
// Change the file to match your GPU.
auto q = Queue(false);
enum size_t N = 128;
float alpha = 5.0;
float[N] res, x,y;
foreach (i; 0 .. N)
x[i] = N - i;
y[i] = i * i;
Buffer!(float) b_res, b_x, b_y;
b_res = Buffer!(float)(res); scope(exit) b_res.release();
b_x = Buffer!(float)(x); scope(exit) b_x.release();
b_y = Buffer!(float)(y); scope(exit) b_y.release();
b_x.copy!(Copy.hostToDevice); // not quite sold on this interface
q.enqueue!(saxpy) // <-- the main magic happens here
([N,1,1],[1,1,1]) // the grid
(b_res,alpha,b_x,b_y, N); // the kernel arguments
foreach(i; 0 .. N)
enforce(res[i] == alpha * x[i] + y[i]);
writeln(res); // [640, 636, ... 16134]
Simple as that!
Dcompute, as always, is at https://github.com/libmir/dcompute and
To successfully run the dcompute CUDA test you will need a very
recent LDC (less than two days) with the NVPTX backend* enabled
along with a CUDA environment and an Nvidia GPU.
*Or wait for LDC 1.4 release real soon(™).
Thanks to the LDC folks for putting up with me ;)
Have fun GPU programming,