Can I have a standard julia "for loop" inside a "device do" of CUDArt?
I tried the following example:
using CUDArt, MyCudaModule
nrow = 10
ncol = 3000
mat = ones(Float64,nrow,ncol)
out1 = zeros(Float64,nrow)
vec = Float64[1:nrow;]
out2 = zeros(Float64,nrow)
d_mat = CudaArray(mat)
d_out1 = CudaArray(out1)
d_vec = CudaArray(vec)
d_out2 = CudaArray(out2)
d_nrow = CudaArray(Int32[nrow;])
d_ncol = CudaArray(Int32[ncol;])
result = devices(dev->capability(dev)[1]>=2) do devlist
MyCudaModule.init(devlist) do dev
blocks = 1
threads = nrow
global result = 0
result = for i in 1:10
MyCudaModule.cudaSumCol(d_out1,d_mat,d_ncol,blocks,threads)
result = to_host(d_out1)[1]
end
end
end
cudaSumCol is a function ta simply sums a matrix´s entries convetring it
into a column, it was wrapped just like the example on CUArt´s README.
the above code without the loop part work just perfectly.
Should I try something different, like not using the do devlist?
thanks,
Joaquim