Re: [Cython] Proposal: Cython array type (CEP 517)

Robert Bradshaw Thu, 11 Jun 2009 23:45:30 -0700

If anything, your proposal has at least been interesting enough to  
generate a lot of discussion. Would a fair summary be that you're  
proposing more full-featured wrapping of PEP 3118's bufferinfo struct  
(where all the semantics/operation logic are embedded in the Cython  
compiler and code emitted inline, rather than in an external class  
like an ndarray)? And essentially you want these to act as Fortran  
arrays, a native type in the language.

I am not opposed to this feature, despite it being off the path of  
being a good, optimizing Python compiler with easy glue to external  
code, as long as it doesn't intrude on the language too much. Mostly,  
this is because there is so much interest in using Cython in a  
numerical processing context. I do, however, have some reservations.

First, it sounds a bit like you're proposing to essentially re- 
implement NumPy. Would it just be slicing and arithmetic, or where  
would you draw the line before it doesn't really belong in a  
compiler, but rather a library. More below:

On Jun 10, 2009, at 11:57 AM, Dag Sverre Seljebotn wrote:

> Brian Granger wrote:
>> Dag,
>>
>> I quickly glanced through the proposal and have two big picture  
>> questions:
>>
>> * What will this make possible that is currently not possible?

This was originally my first question too, but you beat me to it.

> 1) Efficient slices

Is the inefficiency just in the object creation and Python indexing  
semantics? It's still O(1), right? Same with the other operations. (I  
guess there's also a question of result type.)

> 2) Leave the road open for memory bus friendlier arithmetic (re: Hoyt
> Koepke's project proposal)

Could you provide a bit more of a concrete example here? There is  
avoiding allocation of temporary arrays, is there more?

> 3) Automatic contiguous copies in/out if a function is coded to  
> work on
> contiguous memory
>
> Why can't this be done currently?
>
>  * Cython contains no in-compiler support for NumPy, and so cannot  
> know
> how to create new underlying ndarray objects.
>
>  * All optimizations need to hard-code semantics at compile-time. With
> single-element indexing it seemed fair to assume the usual  
> semantics of
> zero-based indexing etc., but with slices tings get worse (which  
> kind of
> object is returned) and with arithmetic downright impossible (what  
> does *
> do again?)
>
> That's not to say there's not other options:
> 1) We could hard-code support for NumPy only, and only allow  
> ndarray and
> not subclasses thereof.
>
> 2) We could invent some new protocol/syntax for defining compile-time
> semantics for all relevant operations.

Here I am torn--I don't like defining compile-time semantics because  
it goes against the whole OO style of inheritance (and feels even  
more remote than the very dynamic, late-binding Python runtime). I  
don't like option (1) either though.

Another an idea, have you thought of using NumPy as the backend? I.e.  
an int[:,:] is any bufferinfo--supporting object, but if one needs to  
be created you create it via an ndarray? This could (potentially)  
facilitate a lot more code reuse (especially for operations that are  
more complicated than a single loop over the data). (Might be messier  
than implementing int[:,:] directly though.) Suppose one develops a  
vectorized version of ndarrays, could that be a drop-in replacement?

The plug-in idea is very interesting, both from the perspective that  
it would allow one to play with different ways to operate on arrays,  
and also it could separate some of the array-processing logic out of  
the core compiler itself.

- Robert

P.S. In terms of componentwise operations vs. concatenate, I think a  
raw list type would be very useful too. int[:] has the advantage that  
it is easy to encode dimension and other information relevant to  
arrays. Memory-managed lists of ints, doubles, structs, etc. would be  
something I would very much like to see added, but via another syntax  
(e.g. [int] or int[]). Using the keyword "vector" or "array" is IMHO  
a bad idea, as it looks like an ordinary type, and bars a user from  
actually using that keyword as well. Syntax via punctuation makes it  
clear that something non-Pythonic is going on here.

P.P.S. Were you actually suggesting that when I have

cdef struct X:
     int a
     double b

I could do

cdef X x = ..., y = ..., z
z = x+y

?
_______________________________________________
Cython-dev mailing list
Cython-dev@codespeak.net
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] Proposal: Cython array type (CEP 517)

Reply via email to