Dag Sverre Seljebotn wrote: > Thanks for your input!, you definitely know more about such computations > than me. > > Roland Schulz wrote: >> Component wise operations without optimization (thus collapsing >> d=a*b*c*d into one loop instead of 3 and not using temporary arrays) >> does not give you any speed-up over Numpy for vectorized code with large >> arrays. >> >> For vectorized Numpy code the bottleneck is not the call from Python to >> C, but the inefficient use of cache because of the temporary arrays. > > I don't know enough about this, but these two paragraphs seem slightly > contradictory to me.
Or did you mean that optimization == collapsing d=a*b*c*d into one loop instead of 3 and not using temporary arrays ? It is definitely the plan of CEP 517 that cdef int[:] a = ..., b = ..., c = ..., d = ... d = a * b * c * d turn into something very similar to cdef size_t tmp1 cdef int[:] tmpresult = new array of right length for tmp1 in range(a.shape[0]): tmpresult[tmp1] = a[tmp1] + b[tmp1] + c[tmp1] + d[tmp1] d = tmpresult although broadcasting should be supported and makes it more complicated (repeat arrays of length 1, if d has length 1 it must be reallocated -- and so on). With multidimensional this becomes more difficult, there's lots of ugly details concerning broadcasting and non-contiguous arrays (where the "innermost" dimension must be found at runtime...) IMPORTANT NOTE: All of this is way ahead, all that is the question now is a coarse roadmap, and whether this is wanted at all or not. -- Dag Sverre _______________________________________________ Cython-dev mailing list Cython-dev@codespeak.net http://codespeak.net/mailman/listinfo/cython-dev