Hello pycuda team,

I hope you are doing well these days. I used pycuda module for a while and
I have a question about the assignment of block/grid size with more than
one dimension. For instance, I wrote a very simple kernel function which is
array addition. I defined my index as `(gridDim.x * blockDim.x *
threadIdx.y) + (blockDim.x * blockIdx.x) + threadIdx.x;` which is
two-dimensional. I tried to call the function with block size (512, 512,
64) and get an error message like `cuFuncSetBlockShape failed: invalid
argument`. I believe the problem lies in the definition of block size.
Also, I found out that it only fails when grid.x * grid.y > 1024 (when
block size =(512, 2, 1), everything is fine). A closer look at my problem
can be assessed with the URL:
https://colab.research.google.com/drive/1KNmXwEQY7oS-nyDRkXwgl3Sw9dV3MW6P?usp=sharing,
where you can know my environment and GPU type, etc.


All the best,
-- 
Mingzhe HU
Columbia University in the City of New York
M.S. in Electrical Engineering
mingzhe...@columbia.edu <mh4...@columbia.edu>
_______________________________________________
PyCUDA mailing list -- pycuda@tiker.net
To unsubscribe send an email to pycuda-le...@tiker.net

Reply via email to