Robert Bradshaw wrote:
> On Jun 12, 2009, at 12:30 AM, Dag Sverre Seljebotn wrote:
>
>   
>
>> In addition perhaps high-level utilities for looping, such as
>> "ndenumerate" etc:
>>
>> cdef int[:,:] arr = ...
>> for (i, j), value in cython.array.ndenumerate(arr):
>>      ...
>>
>> These would be quite nicely contained in cython.array though (and  
>> again
>> have no direct NumPy link that's not horribly slow due to Python-level
>> iterators).
>>     
>
> Something like this could certainly benefit from compile-time  
> optimizations when the dimensions are fixed.
>   
Actually, high-level iteration can be done quite fast when the ndims are 
not fixed too, but it's a long story (main idea is to have slow code 
extract the dimension with the lowest strides, and then have an inner 
for-loop set up over pointers/increments set up by the slower code). I'm 
thinking of "int[...]" syntax for what, which will allow all operations 
except item indexing.

>> Consider
>>
>> B = A + A + A + A + A + A
>>
>> With NumPy, the data of A has to travel over the bus 6 times, with  
>> loop
>> unrolling only 1, as it would turn into
>>
>> B = new memory
>> for idx in ...
>>     B[idx] = A[idx] + A[idx] + ...
>>
>> This can mean a dramatic speed increase as data doesn't have to enter
>> the cache more than once.
>>     
>
> Well, it's unlikely that you would have 6 copies of the same data to  
> add, but it would save on store/load in the target array. The ability  
> to combine multiple operations into a single pass is a powerful one.
>   
I just see that people do use numexpr to get around it.

>> (NumPy can't do this, but there is numexpr, which allow this:
>>
>> B = numexpr.eval("A+A+...")
>>
>> using bytecode interpreter and working in cache-sized blocks.)
>>     
>
> Didn't know about that, that's kind of neat. Have you looked at how  
> it works for ideas?
>   
1) Parses expression tree
2) Repeatedly executes expression tree on ideally-sized sub-chunks of 
the arrays

I.e. it balances array block size (and cache limit) vs. the time it 
takes to execute an expression tree, in order to stay within cache but 
divide the running time of the latter with a big constant factor. With 
Cython there's no need for this as the expression tree is in the format 
the CPU speaks.

Now, it might do SSE etc. as well, that's where I see the plugins, and 
code properly refactored from numexpr could fit in there. Long-term we 
might see the recent NumPy CorePy effort kick in -- CorePy can compile 
code into memory (for use with NumPy) or to assembly text file (for 
linking into Cython code through a plugin).

It would be neat to fold numexpr into Sage BTW :-), i.e.

x, y = var("x, y")
integrate(x**2 + y**2 + x, x)(somenumpyarray)

and run that through numexpr for more efficient evaluation. If I'm to do 
anything myself, I'm as likely to contribute that /before/ working on 
arithmetic in Cython *shrug*.

But even if dynamic expressions in Sage becomes a major mode of working 
with numerics, I still think there's always going to be room for Cython 
to provide honest, no-frills compiled code. And it is anyway likely 
needed to "embrace and extend" Fortran workflow that people are used to.

I wanted to ask you about fasteval in Sage -- I think it does something 
similar (with other types), but I can't seem to find any docs on it? 
Which source file should I look in?

Dag Sverre
_______________________________________________
Cython-dev mailing list
Cython-dev@codespeak.net
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to