On Sun, Apr 26, 2009 at 12:30 AM, Dag Sverre Seljebotn <[email protected]> wrote: > Kurt Smit wrote: >> Clarifying question: In #177, it says, >> >> "Currently there's no way of creating efficient functions taking a >> buffer argument -- for starters cdef functions doesn't support >> declaring an argument as a buffer, and even if it did, it would mean a >> costly (in this context) buffer acquisition on every call." >> >> more precisely, should it say "There's no way of creating efficient >> functions taking a buffer argument /using the buffer syntax/..." since >> one could always just have the function take a Py_buffer(*) argument, >> right? > > Sure :-) But if you go that route, you need to access it by manually > multiplying strides and calculating offsets etc.; and when I say "buffer > argument" I refer to "special Cython buffers" :-) Py_buffer is just a > normal struct seen from Cython's perspective.
Gotcha -- it's straightened out in my mind. Not suggesting actually going that route, just wanted to clarify. > >> On Fri, Apr 24, 2009 at 3:38 AM, Dag Sverre Seljebotn >> <[email protected]> wrote: [snip] >> >> The 'buf' is passed in as a python object and a stack Py_buffer is >> generated inside func2's body that acquires the buffer from buf. So >> for 'def' functions the buffer is acquired/released within the >> function scope, while for 'cdef' functions it is acquired/released >> outside. I'd imagine that cpdef functions would be outside, too. > > You are right about def. There is not support for cdef or cpdef currently, > but what you say for cdef would be #177. cpdef would need to be > in-between: It generates a cdef function would would have the #177 > behaviour, and a def wrapper which would acquire the buffer. Again, thanks. Should've made it clear I was talking about the end state after things are done, not the current state of Cython. The cpdef case makes sense. >>> I first thought that adding one component more would get us all the way: >>> Automatic copying into contiguous buffers. So assuming #177 is >>> implemented, one would then move on to having it instead turn into: >>> >>> Py_buffer buf = acquire buffer from myobject >>> if (buf is contiguous) { >>> myfunc(&buf); >>> } else { >>> Py_buffer buf2 = make contiguous copy of buf >>> myfunc(&buf2) >>> } >>> release buf >>> >>> However, there's a (big) problem here: What Python object does buf2 >>> refer to? >>> >>> Using the one of "buf" would be too confusing as they point to different >>> memory areas. One solution is just setting it to None, perhaps. However >>> once #177 is solved one will expect to be able to do [snip] > But the problem with there not being a Python object to fill in for buf2's > Py_buffer remains. The buffer protocol doesn't define any way of creating > a new buffer of the same Python object type -- i.e. we couldn't create a a > new NumPy ndarray in the right way without adding an additional protocol. > Passing buf's Python object on buf2's behalf would just be wrong as then > accessing the buffer through e.g. slices (which goes through Python layer, > at least currently) would access a different memory area than through item > indexing... I see. So having 2 ways of getting to & modifying the data (fast through the buffer with typed indices, or slowly through the Python API and the PyObject pointer) makes it imperative that whatever buffer is passed to the function references the same data as the PyObject. > >>> All of this makes me think about pushing the "new buffer syntax" a bit >>> harder and get it started on in your GSoC. With that, >>> >>> cdef void myfunc(int[:] buf) >>> >>> could easily give non-surprising effects for #177 and copy-in/copy-out, >>> as >>> the Python object is not "part of the deal". Revisiting this point: myfunc would get a Py_buffer passed to it, that Py_buffer would have a reference to some python object. If myfunc gets the buf->obj PyObject and accesses it in a way that would use the Python API, (e.g. a slice on the referenced object) we'd run into the same problems as above, right? Unless we say that form of access is not allowed or is undefined. I.e. one could only access & modify the buffer's data through the buffer's void *buf field, not going through the buf->obj PyObject reference. Would it be simply a matter of not allowing the programmer access to the buffer's PyObject reference in the new buffer syntax? This would work within the Cython file itself, but an external C function which got the Py_buffer would have to be aware of the pitfalls. A better question: if a contiguous Py_buffer is created from a discontiguous PyObject with a contiguous copy, what to do with the buffer's obj reference, since that PyObject will have a different memory area? Would this be 'out of bounds' since it is understood that by using a Py_buffer one must go through the buffer's void *buf field? If slicing or buffer operations were supported, we'd have to explicitly support it ourselves in Cython, right? This would be a future enhancement. > This is a digression: > > Actually I was planning on having int[:] mean strided, and perhaps > int[::1] or something like that mean contiguous. One could use the third > field for any kind of stride configuration: > > int[:,::1] - C-contiguous > int[::1,:] - Fortran-contiguous > int[:,:] - Strided > > These would be magic short-hands for more explicit specifications. Some more: > int[::strided, ::indirect] - Matrix stored as pointers to strided columns > int[::full, ::1] - First index could use any scheme (if-test required to > see if suboffset is -1 or not), while second index is always contiguous. > > It just means the syntax is extensible, we would start with only > supporting contiguous. [snip] > Ah OK now I see where you are heading with the explicit cast. No, I didn't > think of it like this. Py_buffer is just a reference to existing memory -- > we can "fake" that memory being contiguous by copying for a moment, but > that shouldn't change semantics -- an "int[:]" variable is a reference, > not allocated memory. > > This is very different from Fortran, but matches the way Python works > where everything is a reference. > > I.e. you could also do > > myfunc(myobject) > cdef int[:] buf = myobject > > and buf would get the output of myfunc. And: > > cdef int[:] a = ... > cdef int[:] b = a > b[3] = 2 # also shows up in a[3] > > and so on. > > Note that in general, we should support all cases for cdef functions: > > cdef func(int[::full] buf) # or something like that > > would never require copying as it would support all 1D buffers. But if you do > > cdef func(int[::1] buf) > > then the buffer is forced to be contiguous, if necesarry by copying in and > out. I'm with you on the above. Would we need to draw up a CEP detailing the new syntax, etc and get it approved by the community? (basically expanding http://wiki.cython.org/enhancements/buffersyntax) >>> Your GSoC would then Cython-side consist more or less of >>> >>> a) #177, with new syntax >>> b) A generic mechanism for automatic coercion between buffers of >>> different >>> modes. That is: >>> >>> cdef int[::contiguous] buf # new syntax for mode="c" w/o Python obj? >>> buf = some_object >>> >>> Here, if the buffer of some_object is not contiguous, a contiguous copy >>> will be made! And when releasing buf, it would be copied back. >>> >>> This would make the parts necesarry for Fortran support tremendously >>> useful elsewhere for what I believe will not be much extra effort. >>> >>> (Though I could help out with the parts of those not needed for Fortran >>> support in order to not derail your project.) >> >> I'll need to digest this a bit, but I like it. > > Sure. Note that I corrected my b) point in a seperate mail. Right. To put it in one place: > Anyway, b) above is not good, it raises a lot of questions on semantics > (what is int[:] really -- a reference to memory or the memory itself -- > what happens if one acquires two buffers simultaneously, do they point > to the same memory -- etc). So replace b) above with: > > b) A mechanism for automatically making contiguous copy-in/copy-out on > #177-style method calls if necesarry. What needs to be done to write up the CEP and to start implementing the new syntax, or a subset of it for the GSoC puprposes? What blockers are there that need to be addressed? Would we need to resolve the problem raised in the above section before starting? Also, presuming the new buffer syntax is fully in place (not as a result of my project, which just starts on its implementation) would you expect it to be a replacement for the old syntax? We'd need some way to convert a Py_buffer to a PyObject; as you mention we could do this in the numpy.pxd with a __frombuffer__ method, or by backporting the memoryview to older Python versions. Kurt _______________________________________________ Cython-dev mailing list [email protected] http://codespeak.net/mailman/listinfo/cython-dev
