It seems this was it. If I do the following:
function py_csc(A::SparseMatrixCSC)
# create an empty sparse matrix in Python
Apy = scipy_sparse.csc_matrix(size(A))
# write the values
Apy[:data] = copy(A.nzval)
# write the indices
Apy[:indices] = A.rowval - 1
Apy[:indptr] = A.colptr - 1
return Apy
end
py_csr(A::SparseMatrixCSC) = py_csc(A)[:tocsr]()
then replace
# B_py_csr = scipy_sparse.csr_matrix(B)
B_py_csr = py_csr(B)
I now get reasonably timings.
