Hello everybody,

I'm quite new to cuda and pycuda.

I need a kernel that creates a matrix (of dimension nxd) out of an array (1xd), 
by simply "repeating" the same array n times:

for example, suppose we have n = 4 and d = 3, then if the array is [1 2 3]

the result of my kernel should be:

[1 2 3

 1 2 3

 1 2 3

 1 2 3] (a matrix 4x3)

Basically, it's the same as doing numpy.tile(array, (n, 1))


I've written the code below:


kernel_code_template = """
__global__ void TileKernel(float *in, float *out)
{
    // Each thread computes one element of out
    int y = blockIdx.y * blockDim.y + threadIdx.y;
    int x = blockIdx.x * blockDim.x + threadIdx.x;

    if (y > %(n)s || x > %(d)s) return;

    out[y * %(d)s + x] = in[x];
}
"""

d = 64
n = 512

blockSizex = 16
blockSizey = 16
gridSizex = (d + blockSizex - 1) / blockSizex
gridSizey = (n + blockSizey - 1) / blockSizey

# get the kernel code from the template
kernel_code = kernel_code_template % {
    'd': d,
    'n': n
    }
mod = SourceModule(kernel_code)
TileKernel = mod.get_function("TileKernel")

vec_cpu = np.arange(d).astype(np.float32) # just as an example
vec_gpu = gpuarray.to_gpu(vec_cpu)
out_gpu = gpuarray.empty((n, d), np.float32)

TileKernel.prepare("PP")
TileKernel.prepared_call((gridSizex, gridSizey), (blockSizex, blockSizey, 1), 
vec_gpu.gpudata,  out_gpu.gpudata)

out_cpu = out_gpu.get()

Now, if I run this code with d equals a power of 2 >= 16 I get the right result 
(just like numpy.tile(vec_cpu, (n, 1)) );
but if I set d equals to anything else (let's say for example 88) I get that 
every element of the output matrix has the
correct value, except the first column: some entries are right but others have 
another value (equals to d),
and the entries of the first column that have the wrong value are different 
every run.
I really can't figure out where's the problem, but maybe it's just something 
simple that I'm missing...

Any help will be appreciated, thanks in advance!

Best regards,
Manuele


_______________________________________________
PyCUDA mailing list
PyCUDA@tiker.net
https://lists.tiker.net/listinfo/pycuda

Reply via email to