I have found that a bunch of lapack functions seem to have arguments for
stating whether or not the given arrays are C or F ordered. Then you
wouldn't need to worry about handling the layout yourself. For example, I
have some C++ code like so:

extern "C" {

/**
 * Forward declaration for LAPACK's Fortran dgemm function to allow use in
C/C++ code.
 *
 * This function is used for matrix multiplication between two arrays of
doubles.
 *
 * For complete reference:
http://www.netlib.org/lapack/explore-html/d1/d54/group__double__blas__level3_gaeda3cbd99c8fb834a60a6412878226e1.html
 */
void dgemm_(const char* TRANSA, const char* TRANSB, const int* M, const
int* N, const int* K,
    const double* ALPHA, const double* A, const int* LDA, const double* B,
const int* LDB,
    const double* BETA, double* C, const int* LDC);
}

...

dgemm_("C", "C", &nLayers, &N, &nVariables, &alpha, matrices.IW->data(),
&nVariables,
            inputs.data(), &N, &beta, intermediate.data(), &nLayers);

(in this case, I was using boost multiarrays, but the basic idea is the
same). IIRC, a bunch of other lapack functions had similar features.

I hope this is helpful.

Ben Root



On Wed, Nov 10, 2021 at 6:02 PM Ilhan Polat <ilhanpo...@gmail.com> wrote:

> I've asked this in Cython mailing list but probably I should also get some
> feedback here too.
>
> I have the following function defined in Cython and using flat memory
> pointers to hold n by n array data.
>
>
> cdef some_C_layout_func(double[:, :, ::1] Am) nogil: # ... cdef double *
> work1 = <double*>malloc(n*n*sizeof(double)) cdef double *work2 = <double*>
> malloc(n*n*sizeof(double)) # ... # Lots of C-layout operations here # ...
> dgetrs(<char*>'T', &n, &n, &work1[0], &n, &ipiv[0], &work2[0], &n, &info )
> dcopy(&n2, &work2[0], &int1, &Am[0, 0, 0], &int1) free(...)
>
>
>
>
>
>
>
>
>
> Here, I have done everything in C layout with work1 and work2 but I have
> to convert work2 into Fortran layout to be able to solve AX = B. A can be
> transposed in Lapack internally via the flag 'T' so the only obstacle I
> have now is to shuffle work2 which holds B transpose in the eyes of Fortran
> since it is still in C layout.
>
> If I go naively and make loops to get one layout to the other that
> actually spoils all the speed benefits from this Cythonization due to cache
> misses. In fact 60% of the time is spent in that naive loop across the
> whole function. Same goes for the copy_fortran() of memoryviews.
>
> I have measured the regular NumPy np.asfortranarray()  and the performance
> is quite good enough compared to the actual linear solve. Hence whatever it
> is doing underneath I would like to reach out and do the same possibly via
> the C-API. But my C knowledge basically failed me around this line
> https://github.com/numpy/numpy/blob/8dbd507fb6c854b362c26a0dd056cd04c9c10f25/numpy/core/src/multiarray/multiarraymodule.c#L1817
>
> I have found the SO post from
> https://stackoverflow.com/questions/45143381/making-a-memoryview-c-contiguous-fortran-contiguous
> but I am not sure if that is the canonical way to do it in newer Python
> versions.
>
> Can anyone show me how to go about it without interacting with Python
> objects?
>
> Best,
> ilhan
> _______________________________________________
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: ben.v.r...@gmail.com
>
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to