Re: [Cython] Fortran support design issues

Dag Sverre Seljebotn Mon, 27 Apr 2009 13:55:56 -0700

You seem to be fully up to speed now, happy to see that :-)

Kurt Smith wrote:
> On Sun, Apr 26, 2009 at 12:30 AM, Dag Sverre Seljebotn wrote:
>>>> All of this makes me think about pushing the "new buffer syntax" a bit
>>>> harder and get it started on in your GSoC. With that,
>>>>
>>>>   cdef void myfunc(int[:] buf)
>>>>
>>>> could easily give non-surprising effects for #177 and copy-in/copy-out,
>>>> as
>>>> the Python object is not "part of the deal".
> 
> Revisiting this point:  myfunc would get a Py_buffer passed to it,
> that Py_buffer would have a reference to some python object.  If
> myfunc gets the buf->obj PyObject and accesses it in a way that would
> use the Python API, (e.g. a slice on the referenced object) we'd run
> into the same problems as above, right?  Unless we say that form of
> access is not allowed or is undefined.  I.e. one could only access &
> modify the buffer's data through the buffer's void *buf field, not
> going through the buf->obj PyObject reference.
> 
> Would it be simply a matter of not allowing the programmer access to
> the buffer's PyObject reference in the new buffer syntax?  This would
> work within the Cython file itself, but an external C function which
> got the Py_buffer would have to be aware of the pitfalls.


True.

A)

An interesting question raised here is whether Py_buffer should be 
passed at all. Perhaps
   object[int] would mean passing a Py_buffer
   int[::1] would mean passing struct {void* buf; size_t shape0;}
   int[:] would mean passing struct {void* buf; size_t shape0, stride0;}

Hmm. But refcounts, possible storage and releasing the buffer etc. are 
needed, aren't they...

So the above doesn't work; I just wanted to make a point about the 
actual shape of the buffer passing not being set in stone. I think 
benchmarks could help here -- the Py_buffer struct is big and you need 
pointer lookups to get what you need onto the stack in the reciever. If 
necesarry, Py_buffer could be allocated on the heap and passed as a 
pointer and only used for refcounting. Like this:

struct {
   Py_buffer buf;
   size_t refcount; // users of buffer, release buffer on 0
} __Pyx_buffer;

And then int[:] would mean passing
struct { __Pyx_buffer* buf; void* bif; size_t shape0, stride0; }

(that doesn't add information, but could be faster)

B)

I was thinking of perhaps adding some attributes to this new buffer 
type. One could allow

cdef int[:] buf = ...
print buf.object

which would get the original object from which the buffer was once 
acquired -- but without.

But, this object could also be set to None. And I think for contiguous 
copies, as well as a possible future feature; casting from C pointers:

cdef int[:] buf = <int[:]>someptr

it should be set to None.

> 
> A better question:  if a contiguous Py_buffer is created from a
> discontiguous PyObject with a contiguous copy, what to do with the
> buffer's obj reference, since that PyObject will have a different
> memory area?  Would this be 'out of bounds' since it is understood
> that by using a Py_buffer one must go through the buffer's void *buf
> field?

Hmm. Actually, when doing a contiguous copy, we need to manage that 
memory somehow (and the called function could assign the int[:] to a 
global int[:] var or whatever; I want to start supporting those).

So we can create our own class for that (subclassing memoryview under 
Py2.6+ only).

> If slicing or buffer operations were supported, we'd have to
> explicitly support it ourselves in Cython, right?  This would be a
> future enhancement.

Yep. And then the underlying object and the int[:] would definitely get 
out of sync; I think that should just be documented and "how it works".

> I'm with you on the above.  Would we need to draw up a CEP detailing
> the new syntax, etc and get it approved by the community?  (basically
> expanding http://wiki.cython.org/enhancements/buffersyntax)

I did raise the issue when I wrote that CEP, the response amounted to 
Robert saying "interesting!" :-) So at least noone were outright 
rejecting it at the time.

There's also a thread on the NumPy list:

http://thread.gmane.org/gmane.comp.python.numeric.general/28439

>> Anyway, b) above is not good, it raises a lot of questions on semantics
>> (what is int[:] really -- a reference to memory or the memory itself --
>> what happens if one acquires two buffers simultaneously, do they point
>> to the same memory -- etc). So replace b) above with:
>>
>> b) A mechanism for automatically making contiguous copy-in/copy-out on
>> #177-style method calls if necesarry.
> 
> What needs to be done to write up the CEP and to start implementing
> the new syntax, or a subset of it for the GSoC puprposes?  What
> blockers are there that need to be addressed?  Would we need to
> resolve the problem raised in the above section before starting?

I actually think a good starting point right now is to "take a step 
back": We now understand the issues involved, so let's aim for something 
much lower.

Basically, let it suffice to mid-term evaluation to do the Fortran 
integration while requiring any passed arrays to be Fortran-contiguous.

We can use

cdef external_func(object[int] foo)

as a syntax for this and pass the Py_buffer as-is without ever copying. 
(So drop copying until int[:] can be introduced to get rid of the object 
implications in the syntax.)

If a copy is needed: Raise an exception.

This isn't as bad as it sounds, as with NumPy arrays you can just call 
copy('F') manually for now. So it will be usable, though I hope we can 
get further after that is in place.

Then, after mid-term we can see, and there's plenty of time to plan, let 
things mature, etc.

While we're planning: Do you want me to have a look at the G3 f2py 
source, or will you just attempt some first steps and ping me when you 
see what is needed there or have a question?

> 
> Also, presuming the new buffer syntax is fully in place (not as a
> result of my project, which just starts on its implementation) would
> you expect it to be a replacement for the old syntax?  We'd need some
> way to convert a Py_buffer to a PyObject; as you mention we could do
> this in the numpy.pxd with a __frombuffer__ method, or by backporting
> the memoryview to older Python versions.

I'd like it to take over in the end, but for that to happen we would 
need to implement slices and arithmetic operators in Cython sufficiently 
well, which means it might never happen at the current rate.

But given enough developer hours on this, in the end I hope to be able to do

cdef double my_func(double x): return x*x

def f(buf):
     cdef double[:] buf = arg
     return my_func(buf) + 3 + buf # expands to element-wise for-loop

and so on.

-- 
Dag Sverre
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] Fortran support design issues

Reply via email to