Thanks again. Think the problem may have been with my kernel and getting confused about the row major and column major ordering of the layout of the array. I thought I'd checked it was producing the correct norms yesterday, but I must have changed something...
To get it straight if A is a matrix in main memory, the corresponding GPU memory object is d_A = CudaArray(A) then: A[i, j] = d_A[ j * nrows + i] Is that right? I guess I got confused by the discussion of transposition in the CUDArt docs. Matthew
