Personally, I think that the time would be better spent optimizing
routines for single-threaded code and relying on BLAS and LAPACK
libraries to use multiple cores for more complex calculations. In
particular, doing some basic loop unrolling and SSE versions of the
ufuncs would be beneficial. I have some experience writing SSE code
using intrinsics and would be happy to give it a shot if people tell
me what functions I should focus on.

James
_______________________________________________
Numpy-discussion mailing list
[email protected]
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to