Dag Sverre Seljebotn wrote:
> Thanks for your input!, you definitely know more about such computations 
> than me.
> 
> Roland Schulz wrote:
>> Component wise operations without optimization (thus collapsing 
>> d=a*b*c*d into one loop instead of 3 and not using temporary arrays) 
>> does not give you any speed-up over Numpy for vectorized code with large 
>> arrays.
>>
>> For vectorized Numpy code the bottleneck is not the call from Python to 
>> C, but the inefficient use of cache because of the temporary arrays.
> 
> I don't know enough about this, but these two paragraphs seem slightly 
> contradictory to me.

Or did you mean that

optimization == collapsing d=a*b*c*d into one loop instead of 3 and not 
using temporary arrays

?

It is definitely the plan of CEP 517 that

cdef int[:] a = ..., b = ..., c = ..., d = ...
d = a * b * c * d

turn into something very similar to

cdef size_t tmp1
cdef int[:] tmpresult = new array of right length
for tmp1 in range(a.shape[0]):
     tmpresult[tmp1] = a[tmp1] + b[tmp1] + c[tmp1] + d[tmp1]
d = tmpresult

although broadcasting should be supported and makes it more complicated 
(repeat arrays of length 1, if d has length 1 it must be reallocated -- 
and so on). With multidimensional this becomes more difficult, there's 
lots of ugly details concerning broadcasting and non-contiguous arrays 
(where the "innermost" dimension must be found at runtime...)

IMPORTANT NOTE: All of this is way ahead, all that is the question now 
is a coarse roadmap, and whether this is wanted at all or not.

-- 
Dag Sverre
_______________________________________________
Cython-dev mailing list
Cython-dev@codespeak.net
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to