Personally, I think that the time would be better spent optimizing routines for single-threaded code and relying on BLAS and LAPACK libraries to use multiple cores for more complex calculations. In particular, doing some basic loop unrolling and SSE versions of the ufuncs would be beneficial. I have some experience writing SSE code using intrinsics and would be happy to give it a shot if people tell me what functions I should focus on.
James _______________________________________________ Numpy-discussion mailing list [email protected] http://projects.scipy.org/mailman/listinfo/numpy-discussion
