Ying Wai (Daniel) Fan <y...@...> writes:

> 
> 
> > 2. When I do sgemm(a, b, c) where a and b are gpuarray's, I am getting
> > c = np.dot(b, a) instead of c = np.dot(a, b). Does gpuarray convert
> > row major format to something else (column?) in its internal
> > representation? Or am I calling sgemm incorrectly?
> >   
> I have a wrapper for CUBLAS in my Python package PARRET. I know exactly 
> what is happening here. Let me just quote what I have in PARRET's 
> documentation.
> 
> Since GPUArray stores matrix entries in row-major ordering, but CUBLAS 
> assumes column-major ordering, caution need to be taken when passing 
> GPUArray objects as arguments.
> 
>     * No change need to be made for BLAS 1 functions.
>     * For BLAS 2 functions, the matrix is interpreted as transposed
>       matrix, so the transp flag need to be set accordingly.
>     * For BLAS 3 functions, the input matrices and output matrix are
>       interpreted as transposed matrices, so the order of matrix
>       multiplication need to switched, while the transp flags should
>       remain unchanged.
> 
> Ying Wai (Daniel) Fan
> 
> 

That answers my question, thanks. This behavior is what confused me:

a = np.random.randn(4).astype(np.float32).reshape((2,2), order='f')
np.all(a.T == gpuarray.to_gpu(a).get())

True

Thanks for pointing me to PARRET. You have already done what I wanted to do, I
hope. I may also use convolution instead of what I was doing as I am not getting
the desired speedup with sgemm.

There is a small typo where you load cuda libraries with ctypes. If the platform
is not Linux, the final 'else' always gets executed.



_______________________________________________
PyCUDA mailing list
pyc...@host304.hostmonster.com
http://host304.hostmonster.com/mailman/listinfo/pycuda_tiker.net

Reply via email to