Robert Bradshaw wrote:
> On Mar 7, 2009, at 2:37 PM, Dag Sverre Seljebotn wrote:
>> My proposal is here:
>> http://wiki.cython.org/enhancements/buffersyntax
>>
>> And the thread from the numpy list here:
>> http://thread.gmane.org/gmane.comp.python.numeric.general/28439
>
> Very interesting, and I like the syntax for the most part (int
> [:,:,"fortran"] seems a bit odd to me, perhaps mode should be a
> keyword).

Consider though that

@cython.locals(a=cython.int[:,:,"fortran"])
def foo(): ...

is legal Python. Dropping Python compatability on a new syntax like this
seems like a high price to pay.

Perhaps cython.int[:,:,cython.buffer.FORTRAN_MODE] or similar?

> Can you pass them around/have them assigned to global/class-level
> variables? Will they be garbage collected? Also, how much of numpy

Global/class-level was always the intention for current buffers as well,
just never got around to it. So yes, it would just need to be done.

Implementation side they can be considered (smaller variations on)
Py_buffer structs. The variables themselves will (up to optimization) be
structs which are copied by value, however they point to Python objects
which will be refcounted in the process.

a) They will hold a refcounted reference to an underlying object owning
the memory, and b) they will (sometimes) need a seperate "acquisition
refcount" so that when making a slice, the buffer will not be released
until the last slice goes out of scope.

All in all a happy mixture of copying by value and reference counting.

> will you be duplicating, and how easy will it be to turn one of these
> bare buffers into a numpy array? (I do like the easier int* <-> int
> [:] relation though.)

myarr = numpy.array(mybuf)

This is currently used for conversion from lists etc. Implementation-wise
this is rather simple (NumPy can support memoryview in newer
releases/Python versions, and until then we can subclass memoryview to
provide the NumPy __toarray__ protocol which numpy.array supports).

How much duplication: Well at first nothing, but suppose the whole
wanted-list is done (which depends on bringing in e.g. a student):

All in all, it's more a matter of reimplementing small parts of Fortran
than reimplementing NumPy. (I am actually toying with the idea of
outputting and linking in Fortran code, but I don't think it would buy us
much).

a) Slices is pretty much duplication of functionality, but doing it
compile-time, for speed, means it must be coded in a different way than in
NumPy.

b) The same goes for arithmetic. The NumPy one-operation-at-the-time
approach isn't the most efficient one and if we decide to do this
ourselves, it would be in order to do it in a different way, i.e. turn "a
= b + sqrt(c)" into a full inline loop.

c) Functions (log, sqrt, sin, etc.). Currently there's NumPy which only
takes a whole array and C which only takes a single element. What one
needs is functions which can be used on both levels, so that "a = sqrt(c)"
can be used on buffer level but be turned by Cython into a loop calling
sqrt on all elements individually.

All in all, I think arithmetic on buffers needs pluggability to handle
different backends anyway (does one have the commercial Intel MKL library?
And so on.)

At the cost of a very strong NumPy dependency (if this feature is used),
one could of course implement this through NumPy at first, perhaps as one
"backend". I.e. Cython-generated modules with buffer arithmetic would at
first implicitly depend on NumPy (even if all you're trying to do is sum
the contents of two C arrays).

But I'm still thinking about possible solutions for the arithmetic part,
it's a tricky part.

Dag Sverre

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to