Ying Wai (Daniel) Fan <y...@...> writes: > > > > 2. When I do sgemm(a, b, c) where a and b are gpuarray's, I am getting > > c = np.dot(b, a) instead of c = np.dot(a, b). Does gpuarray convert > > row major format to something else (column?) in its internal > > representation? Or am I calling sgemm incorrectly? > > > I have a wrapper for CUBLAS in my Python package PARRET. I know exactly > what is happening here. Let me just quote what I have in PARRET's > documentation. > > Since GPUArray stores matrix entries in row-major ordering, but CUBLAS > assumes column-major ordering, caution need to be taken when passing > GPUArray objects as arguments. > > * No change need to be made for BLAS 1 functions. > * For BLAS 2 functions, the matrix is interpreted as transposed > matrix, so the transp flag need to be set accordingly. > * For BLAS 3 functions, the input matrices and output matrix are > interpreted as transposed matrices, so the order of matrix > multiplication need to switched, while the transp flags should > remain unchanged. > > Ying Wai (Daniel) Fan > >
That answers my question, thanks. This behavior is what confused me: a = np.random.randn(4).astype(np.float32).reshape((2,2), order='f') np.all(a.T == gpuarray.to_gpu(a).get()) True Thanks for pointing me to PARRET. You have already done what I wanted to do, I hope. I may also use convolution instead of what I was doing as I am not getting the desired speedup with sgemm. There is a small typo where you load cuda libraries with ctypes. If the platform is not Linux, the final 'else' always gets executed. _______________________________________________ PyCUDA mailing list pyc...@host304.hostmonster.com http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net