Clarifying question:  In #177, it says,

"Currently there's no way of creating efficient functions taking a
buffer argument -- for starters cdef functions doesn't support
declaring an argument as a buffer, and even if it did, it would mean a
costly (in this context) buffer acquisition on every call."

more precisely, should it say "There's no way of creating efficient
functions taking a buffer argument /using the buffer syntax/..." since
one could always just have the function take a Py_buffer(*) argument,
right?


On Fri, Apr 24, 2009 at 3:38 AM, Dag Sverre Seljebotn
<[email protected]> wrote:
> (This is a move of my and Kurt's discussion into the public; I'll give
> some context but not that much.)
>
> Kurt: This will go into a broader discussion of buffers in Cython as such
> and not be directly Fortran-relevant. I want to have Fortran support
> integrated in a nice way into Cython, so I think this is necesarry to some
> degree, but I'll try not to overdo it.
>
> So, you wanted to target contiguous buffer passing first. And I was
> saying that fixing
>
> http://trac.cython.org/cython_trac/ticket/177
>
> would for the most part amount to what we need. Adding support for
>
>   cdef void myfunc(object[int, mode="c"] buf)
>
> which would have the C signature
>
>   void myfunc(Py_buffer* buf) // + a contract on int/contiguous
>
> and have the caller be responsible for getting the buffer, gets us far.
> It would mean that you could do
>
>   myfunc(myobject)
>
> and it would turn into
>
>   Py_buffer buf = acquire buffer from myobject
>   raise exception if buf is not contiguous
>   myfunc(&buf)
>   release buf

Just stating my knowledge as it is right now -- for

def func2(object[int, mode='c'] buf):
    # body here

The 'buf' is passed in as a python object and a stack Py_buffer is
generated inside func2's body that acquires the buffer from buf.  So
for 'def' functions the buffer is acquired/released within the
function scope, while for 'cdef' functions it is acquired/released
outside.  I'd imagine that cpdef functions would be outside, too.

>
> I first thought that adding one component more would get us all the way:
> Automatic copying into contiguous buffers. So assuming #177 is
> implemented, one would then move on to having it instead turn into:
>
>   Py_buffer buf = acquire buffer from myobject
>   if (buf is contiguous) {
>     myfunc(&buf);
>   } else {
>     Py_buffer buf2 = make contiguous copy of buf
>     myfunc(&buf2)
>   }
>   release buf
>
> However, there's a (big) problem here: What Python object does buf2 refer to?
>
> Using the one of "buf" would be too confusing as they point to different
> memory areas. One solution is just setting it to None, perhaps. However
> once #177 is solved one will expect to be able to do

How hard would it be to have a contiguous -> strided (or indirect)
buffer copy utility function in Cython, and this copy be triggered
after the myfunc(&buf2) call?  The downside would be a 2 copy penalty
for passing a strided (or indirect) object to the function, but it
would then handle the problem above, right?  The programmer should use
contiguous arrays to avoid the inefficiency.

Please correct any misunderstanding.

>
>   cdef void myfunc(np.ndarray[int, mode="c"] buf)
>
> and work simultaneously with NumPy functions/operators and buffer access,
> and that works nice in itself with only #177, but with the addition of
> copy-in/copy-out one then has a problem.
>
> All of this makes me think about pushing the "new buffer syntax" a bit
> harder and get it started on in your GSoC. With that,
>
>   cdef void myfunc(int[:] buf)
>
> could easily give non-surprising effects for #177 and copy-in/copy-out, as
> the Python object is not "part of the deal".

Let me get this right:  the "int[:] buf" is syntactic sugar for a
Py_buffer, whereas "object[int] buf" represents a Python object that
conforms to the buffer protocol.  Access to the "int[:] buf" would be
always done at C level (perhaps a conversion on the indices), whereas
access to the "object[int] buf" is by default done through the Python
API layer, unless the indices are c ints.  So dealing with just the
Py_buffers Cython side avoids much of the mucking around with
PyObjects, etc, and would be much more efficient, since a Py_buffer is
effectively a souped-up C array (N-dimensions, strides, suboffsets,
etc.)  The "int[:] buf" can always be converted into a memoryview
object for use in the Python layer.

So, in the example above with the new syntax:

cdef void myfunc(int[:] buf)

would have the C prototype

void myfunc(Py_buffer *buf) // + contract on contiguous/int

A call within Cython code with a PyObject argument would require an
explicit cast, wouldn't it?

So one would have to do

myfunc(<int[:]>myobject)

Which would generate:

Py_buffer temp_buffer = acquire buffer from myobject
/* everything else the same, without object reference worries */

If the programmer wants to pass in a buffer and get something back, do
in Cython:

cdef int[:] pass_buf = <int[:]>myobject
myfunc(pass_buf)
# convert pass_buf to other object, use its contents, etc.

>
> Your GSoC would then Cython-side consist more or less of
>
> a) #177, with new syntax
> b) A generic mechanism for automatic coercion between buffers of different
> modes. That is:
>
> cdef int[::contiguous] buf # new syntax for mode="c" w/o Python obj?
> buf = some_object
>
> Here, if the buffer of some_object is not contiguous, a contiguous copy
> will be made! And when releasing buf, it would be copied back.
>
> This would make the parts necesarry for Fortran support tremendously
> useful elsewhere for what I believe will not be much extra effort.
>
> (Though I could help out with the parts of those not needed for Fortran
> support in order to not derail your project.)

I'll need to digest this a bit, but I like it.

Kurt
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to