Re: [Cython] Proposal: Cython array type (CEP 517)

Robert Bradshaw Fri, 12 Jun 2009 01:03:07 -0700

On Jun 12, 2009, at 12:30 AM, Dag Sverre Seljebotn wrote:

> A quick answer to some of the points.
>
> Robert Bradshaw wrote:
>> First, it sounds a bit like you're proposing to essentially re-
>> implement NumPy. Would it just be slicing and arithmetic, or where
>> would you draw the line before it doesn't really belong in a
>> compiler, but rather a library. More below:
>
> At this point, I should stress that what I'm after is a roadmap  
> (how do
> I deal with requests coming up, and things like Hoyt's proposed  
> project
> when that came up). I won't have time for doing more than the basic
> array stuff (to better facilitate ourself of Kurt's work) this time
> around -- which is still very useful for exchanging array data with
> C/Fortran code.
>
> It would be "reimplementing" NumPy as far as the core array API goes,
> but the implementation would be totally different as we have a
> compilation stage. If anything, it is reimplementing Fortran, however
> Fortran desperately need more alternatives :-)
>
> As for the library, the CEP scetches full interopability with the
> library, you would likely still do
>
> arr = np.dot(arr,arr)
>
> for matrix mul.
>
> What to include? I think slicing, arithmetic (with "broadcasting", i.e
> dimensions of length 1 are repeated to make arrays conform), transpose
> (arr.T). No "sum()" member function or similar.


That sound reasonable.

> In addition perhaps high-level utilities for looping, such as
> "ndenumerate" etc:
>
> cdef int[:,:] arr = ...
> for (i, j), value in cython.array.ndenumerate(arr):
>      ...
>
> These would be quite nicely contained in cython.array though (and  
> again
> have no direct NumPy link that's not horribly slow due to Python-level
> iterators).

Something like this could certainly benefit from compile-time  
optimizations when the dimensions are fixed.

>
>
>>
>> On Jun 10, 2009, at 11:57 AM, Dag Sverre Seljebotn wrote:
>>
>>> Brian Granger wrote:
>>>> Dag,
>>>>
>>>> I quickly glanced through the proposal and have two big picture
>>>> questions:
>>>>
>>>> * What will this make possible that is currently not possible?
>>
>> This was originally my first question too, but you beat me to it.
>>
>>> 1) Efficient slices
>>
>> Is the inefficiency just in the object creation and Python indexing
>> semantics? It's still O(1), right? Same with the other operations. (I
>> guess there's also a question of result type.)
>
> Well, and then the resulting buffer is acquired and checked for data
> type. But essentially, yes.
>
> This is purely a matter of notational convenience -- will it hurt to
> have that O(1) in your almost-inner-loop taking a slice of 50  
> elements,
> or do you pass "start", "end" around in your function?

Your inner loop would typically be over largish arrays, so it  
probably wouldn't matter much, but I see your point.

>>> 2) Leave the road open for memory bus friendlier arithmetic (re:  
>>> Hoyt
>>> Koepke's project proposal)
>>
>> Could you provide a bit more of a concrete example here? There is
>> avoiding allocation of temporary arrays, is there more?
>
> Most systems today are for simple functions like addition  
> constrained by
> memory bus speed, not CPU speed.

Yep.

> Consider
>
> B = A + A + A + A + A + A
>
> With NumPy, the data of A has to travel over the bus 6 times, with  
> loop
> unrolling only 1, as it would turn into
>
> B = new memory
> for idx in ...
>     B[idx] = A[idx] + A[idx] + ...
>
> This can mean a dramatic speed increase as data doesn't have to enter
> the cache more than once.

Well, it's unlikely that you would have 6 copies of the same data to  
add, but it would save on store/load in the target array. The ability  
to combine multiple operations into a single pass is a powerful one.

> (NumPy can't do this, but there is numexpr, which allow this:
>
> B = numexpr.eval("A+A+...")
>
> using bytecode interpreter and working in cache-sized blocks.)

Didn't know about that, that's kind of neat. Have you looked at how  
it works for ideas?

>>> 3) Automatic contiguous copies in/out if a function is coded to
>>> work on
>>> contiguous memory
>>>
>>> Why can't this be done currently?
>>>
>>>  * Cython contains no in-compiler support for NumPy, and so cannot
>>> know
>>> how to create new underlying ndarray objects.
>>>
>>>  * All optimizations need to hard-code semantics at compile-time.  
>>> With
>>> single-element indexing it seemed fair to assume the usual
>>> semantics of
>>> zero-based indexing etc., but with slices tings get worse (which
>>> kind of
>>> object is returned) and with arithmetic downright impossible (what
>>> does *
>>> do again?)
>>>
>>> That's not to say there's not other options:
>>> 1) We could hard-code support for NumPy only, and only allow
>>> ndarray and
>>> not subclasses thereof.
>>>
>>> 2) We could invent some new protocol/syntax for defining compile- 
>>> time
>>> semantics for all relevant operations.
>>
>> Here I am torn--I don't like defining compile-time semantics because
>> it goes against the whole OO style of inheritance (and feels even
>> more remote than the very dynamic, late-binding Python runtime). I
>> don't like option (1) either though.
>>
>> Another an idea, have you thought of using NumPy as the backend? I.e.
>> an int[:,:] is any bufferinfo--supporting object, but if one needs to
>> be created you create it via an ndarray? This could (potentially)
>> facilitate a lot more code reuse (especially for operations that are
>> more complicated than a single loop over the data). (Might be messier
>> than implementing int[:,:] directly though.) Suppose one develops a
>> vectorized version of ndarrays, could that be a drop-in replacement?
>
> ? ndarrays are vectorized? But subclasses may not be.

I'm just thinking about taking advantage of improvements/ 
optimizations from ndarrays. But a lot of the stuff you're talking  
about is clearly and optimally O(n) so this is less of an issue  
(especially if you make it pluggable to support future parallelization).

> I've thought about it. If arithmetic is implemented on arrays, the  
> only
> thing that's seems to be reused this way is malloc/free of a memory  
> area
> though. Better loose the dependency then.

OK, makes sense. Wanted to know your thoughts at least.

- Robert

_______________________________________________
Cython-dev mailing list
Cython-dev@codespeak.net
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] Proposal: Cython array type (CEP 517)

Reply via email to