A Friday 22 May 2009 13:52:46 Andrew Friedley escrigué:
> (sending again)
>
> Hi,
>
> I'm the student doing the project.  I have a blog here, which contains
> some initial performance numbers for a couple test ufuncs I did:
>
> http://numcorepy.blogspot.com
>
> It's really too early yet to give definitive results though; GSoC
> officially starts in two days :)  What I'm finding is that the existing
> ufuncs are already pretty fast; it appears right now that the main
> limitation is memory bandwidth.  If that's really the case, the
> performance gains I'll get will be through cache tricks (non-temporal
> loads/stores), reducing memory accesses and using multiple cores to get
> more bandwidth.
>
> Another alternative we've talked about, and I (more and more likely) may
> look into is composing multiple operations together into a single ufunc.
>   Again the main idea being that memory accesses can be reduced/eliminated.

IMHO, composing multiple operations together is the most promising venue for 
leveraging current multicore systems.

Another interesting approach is to implement costly operations (from the point 
of view of CPU resources), namely, transcendental functions like sin, cos or 
tan, but also others like sqrt or pow) in a parallel way.  If besides, you can 
combine this with vectorized versions of them (by using the well spread SSE2 
instruction set, see [1] for an example), then you would be able to achieve 
really good results for sure (at least Intel did with its VML library ;)

[1] http://gruntthepeon.free.fr/ssemath/

Cheers,

-- 
Francesc Alted
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to