Hello pycuda team, I hope you are doing well these days. I used pycuda module for a while and I have a question about the assignment of block/grid size with more than one dimension. For instance, I wrote a very simple kernel function which is array addition. I defined my index as `(gridDim.x * blockDim.x * threadIdx.y) + (blockDim.x * blockIdx.x) + threadIdx.x;` which is two-dimensional. I tried to call the function with block size (512, 512, 64) and get an error message like `cuFuncSetBlockShape failed: invalid argument`. I believe the problem lies in the definition of block size. Also, I found out that it only fails when grid.x * grid.y > 1024 (when block size =(512, 2, 1), everything is fine). A closer look at my problem can be assessed with the URL: https://colab.research.google.com/drive/1KNmXwEQY7oS-nyDRkXwgl3Sw9dV3MW6P?usp=sharing, where you can know my environment and GPU type, etc.
All the best, -- Mingzhe HU Columbia University in the City of New York M.S. in Electrical Engineering mingzhe...@columbia.edu <mh4...@columbia.edu>
_______________________________________________ PyCUDA mailing list -- pycuda@tiker.net To unsubscribe send an email to pycuda-le...@tiker.net