Thanks again.

Think the problem may have been with my kernel and getting confused about 
the row major and column major ordering of the layout of the array. 
I thought I'd checked it was producing the correct norms yesterday, but I 
must have changed something...

To get it straight if A is a matrix in main memory, the corresponding GPU 
memory object is d_A = CudaArray(A) then:

A[i, j] = d_A[ j * nrows + i]

Is that right? I guess I got confused by the discussion of transposition in 
the CUDArt docs.

Matthew

Reply via email to