A Thursday 05 March 2009, Dag Sverre Seljebotn escrigué: > > At first sight, having a kind of Numexpr kernel inside Cython would > > be great, but provided that you can already call Numexpr from both > > Python/Cython, I wonder which would be the advantage to do so. As > > I see it, it would be better to have: > > > > c = numexpr.evaluate("a + b") > > > > in the middle of Cython code than just: > > > > c = a + b > > > > in the sense that the former would allow the programmer to see > > whether Numexpr is called explicitely or not. > > The former would need to invoke the parser etc., which one would > *not* need to do when one has the Cython compilation step.
Ah, yes. That's a good point. > When I > mention numexpr it is simply because there's gone work in it already > to optimize these things; that experience could hopefully be kept, > while discarding the parser and opcode system. > > I know too little about these things, but look: > > Cython can relatively easily transform things like > > cdef int[:,:] a = ..., b = ... > c = a + b * b > > into a double for-loop with c[i,j] = a[i,j] + b[i,j] * b[i,j] at its > core. A little more work could have it iterate the smallest dimension > innermost dynamically (in strided mode). > > If a and b are declared as contiguous arrays and "restrict", I > suppose the C compiler could do the most efficient thing in a lot of > cases? (I.e. "cdef restrict int[:,:,"c"]" or similar) Agreed. > > However if one has a strided array, numexpr could still give an > advantage over such a loop. Or? Well, I suppose that, provided that Cython could perform the for-loop transformation, giving support for strided arrays would be relatively trivial, and the performance would be similar than numexpr in this case. The case for unaligned arrays would a bit different, as the next trick is used: whenever an unaligned array is detected, a new 'copy' opcode is issued so that, for each data block, a copy is done in order to make the data aligned. As the block sizes are chosen to fit easily in CPU's level-1 cache, this copy operation is done very fast and impacts rather little on performance. As I see it, this would be the only situation that would be more complicated to implement natively in Cython because it requires non-trivial code for both blocking and handle opcodes. However, for most of situations, my guess is that unaligned array operands do not appear, so perhaps the unaligned case optimization would not be so important for implementing it Cython. > But anyway, this is easily one year ahead of us, unless more > numerical Cython developers show up. Cheers, -- Francesc Alted _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion