Clarifying question: In #177, it says, "Currently there's no way of creating efficient functions taking a buffer argument -- for starters cdef functions doesn't support declaring an argument as a buffer, and even if it did, it would mean a costly (in this context) buffer acquisition on every call."
more precisely, should it say "There's no way of creating efficient functions taking a buffer argument /using the buffer syntax/..." since one could always just have the function take a Py_buffer(*) argument, right? On Fri, Apr 24, 2009 at 3:38 AM, Dag Sverre Seljebotn <[email protected]> wrote: > (This is a move of my and Kurt's discussion into the public; I'll give > some context but not that much.) > > Kurt: This will go into a broader discussion of buffers in Cython as such > and not be directly Fortran-relevant. I want to have Fortran support > integrated in a nice way into Cython, so I think this is necesarry to some > degree, but I'll try not to overdo it. > > So, you wanted to target contiguous buffer passing first. And I was > saying that fixing > > http://trac.cython.org/cython_trac/ticket/177 > > would for the most part amount to what we need. Adding support for > > cdef void myfunc(object[int, mode="c"] buf) > > which would have the C signature > > void myfunc(Py_buffer* buf) // + a contract on int/contiguous > > and have the caller be responsible for getting the buffer, gets us far. > It would mean that you could do > > myfunc(myobject) > > and it would turn into > > Py_buffer buf = acquire buffer from myobject > raise exception if buf is not contiguous > myfunc(&buf) > release buf Just stating my knowledge as it is right now -- for def func2(object[int, mode='c'] buf): # body here The 'buf' is passed in as a python object and a stack Py_buffer is generated inside func2's body that acquires the buffer from buf. So for 'def' functions the buffer is acquired/released within the function scope, while for 'cdef' functions it is acquired/released outside. I'd imagine that cpdef functions would be outside, too. > > I first thought that adding one component more would get us all the way: > Automatic copying into contiguous buffers. So assuming #177 is > implemented, one would then move on to having it instead turn into: > > Py_buffer buf = acquire buffer from myobject > if (buf is contiguous) { > myfunc(&buf); > } else { > Py_buffer buf2 = make contiguous copy of buf > myfunc(&buf2) > } > release buf > > However, there's a (big) problem here: What Python object does buf2 refer to? > > Using the one of "buf" would be too confusing as they point to different > memory areas. One solution is just setting it to None, perhaps. However > once #177 is solved one will expect to be able to do How hard would it be to have a contiguous -> strided (or indirect) buffer copy utility function in Cython, and this copy be triggered after the myfunc(&buf2) call? The downside would be a 2 copy penalty for passing a strided (or indirect) object to the function, but it would then handle the problem above, right? The programmer should use contiguous arrays to avoid the inefficiency. Please correct any misunderstanding. > > cdef void myfunc(np.ndarray[int, mode="c"] buf) > > and work simultaneously with NumPy functions/operators and buffer access, > and that works nice in itself with only #177, but with the addition of > copy-in/copy-out one then has a problem. > > All of this makes me think about pushing the "new buffer syntax" a bit > harder and get it started on in your GSoC. With that, > > cdef void myfunc(int[:] buf) > > could easily give non-surprising effects for #177 and copy-in/copy-out, as > the Python object is not "part of the deal". Let me get this right: the "int[:] buf" is syntactic sugar for a Py_buffer, whereas "object[int] buf" represents a Python object that conforms to the buffer protocol. Access to the "int[:] buf" would be always done at C level (perhaps a conversion on the indices), whereas access to the "object[int] buf" is by default done through the Python API layer, unless the indices are c ints. So dealing with just the Py_buffers Cython side avoids much of the mucking around with PyObjects, etc, and would be much more efficient, since a Py_buffer is effectively a souped-up C array (N-dimensions, strides, suboffsets, etc.) The "int[:] buf" can always be converted into a memoryview object for use in the Python layer. So, in the example above with the new syntax: cdef void myfunc(int[:] buf) would have the C prototype void myfunc(Py_buffer *buf) // + contract on contiguous/int A call within Cython code with a PyObject argument would require an explicit cast, wouldn't it? So one would have to do myfunc(<int[:]>myobject) Which would generate: Py_buffer temp_buffer = acquire buffer from myobject /* everything else the same, without object reference worries */ If the programmer wants to pass in a buffer and get something back, do in Cython: cdef int[:] pass_buf = <int[:]>myobject myfunc(pass_buf) # convert pass_buf to other object, use its contents, etc. > > Your GSoC would then Cython-side consist more or less of > > a) #177, with new syntax > b) A generic mechanism for automatic coercion between buffers of different > modes. That is: > > cdef int[::contiguous] buf # new syntax for mode="c" w/o Python obj? > buf = some_object > > Here, if the buffer of some_object is not contiguous, a contiguous copy > will be made! And when releasing buf, it would be copied back. > > This would make the parts necesarry for Fortran support tremendously > useful elsewhere for what I believe will not be much extra effort. > > (Though I could help out with the parts of those not needed for Fortran > support in order to not derail your project.) I'll need to digest this a bit, but I like it. Kurt _______________________________________________ Cython-dev mailing list [email protected] http://codespeak.net/mailman/listinfo/cython-dev
