I have found that a bunch of lapack functions seem to have arguments for stating whether or not the given arrays are C or F ordered. Then you wouldn't need to worry about handling the layout yourself. For example, I have some C++ code like so:
extern "C" { /** * Forward declaration for LAPACK's Fortran dgemm function to allow use in C/C++ code. * * This function is used for matrix multiplication between two arrays of doubles. * * For complete reference: http://www.netlib.org/lapack/explore-html/d1/d54/group__double__blas__level3_gaeda3cbd99c8fb834a60a6412878226e1.html */ void dgemm_(const char* TRANSA, const char* TRANSB, const int* M, const int* N, const int* K, const double* ALPHA, const double* A, const int* LDA, const double* B, const int* LDB, const double* BETA, double* C, const int* LDC); } ... dgemm_("C", "C", &nLayers, &N, &nVariables, &alpha, matrices.IW->data(), &nVariables, inputs.data(), &N, &beta, intermediate.data(), &nLayers); (in this case, I was using boost multiarrays, but the basic idea is the same). IIRC, a bunch of other lapack functions had similar features. I hope this is helpful. Ben Root On Wed, Nov 10, 2021 at 6:02 PM Ilhan Polat <ilhanpo...@gmail.com> wrote: > I've asked this in Cython mailing list but probably I should also get some > feedback here too. > > I have the following function defined in Cython and using flat memory > pointers to hold n by n array data. > > > cdef some_C_layout_func(double[:, :, ::1] Am) nogil: # ... cdef double * > work1 = <double*>malloc(n*n*sizeof(double)) cdef double *work2 = <double*> > malloc(n*n*sizeof(double)) # ... # Lots of C-layout operations here # ... > dgetrs(<char*>'T', &n, &n, &work1[0], &n, &ipiv[0], &work2[0], &n, &info ) > dcopy(&n2, &work2[0], &int1, &Am[0, 0, 0], &int1) free(...) > > > > > > > > > > Here, I have done everything in C layout with work1 and work2 but I have > to convert work2 into Fortran layout to be able to solve AX = B. A can be > transposed in Lapack internally via the flag 'T' so the only obstacle I > have now is to shuffle work2 which holds B transpose in the eyes of Fortran > since it is still in C layout. > > If I go naively and make loops to get one layout to the other that > actually spoils all the speed benefits from this Cythonization due to cache > misses. In fact 60% of the time is spent in that naive loop across the > whole function. Same goes for the copy_fortran() of memoryviews. > > I have measured the regular NumPy np.asfortranarray() and the performance > is quite good enough compared to the actual linear solve. Hence whatever it > is doing underneath I would like to reach out and do the same possibly via > the C-API. But my C knowledge basically failed me around this line > https://github.com/numpy/numpy/blob/8dbd507fb6c854b362c26a0dd056cd04c9c10f25/numpy/core/src/multiarray/multiarraymodule.c#L1817 > > I have found the SO post from > https://stackoverflow.com/questions/45143381/making-a-memoryview-c-contiguous-fortran-contiguous > but I am not sure if that is the canonical way to do it in newer Python > versions. > > Can anyone show me how to go about it without interacting with Python > objects? > > Best, > ilhan > _______________________________________________ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: ben.v.r...@gmail.com >
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com