Francesc Altet wrote: > > Why not? IMHO, complex operations requiring a great deal of operations > per word, like trigonometric, exponential, etc..., are the best > candidates to take advantage of several cores or even SSE instructions > (not sure whether SSE supports this sort of operations, though).
I was talking about the general "using openmp" thing in numpy context. If it was just adding one line at one place in the source code, someone would already have done it, no ? But there are build issues, for example: you have to add support for openmp at compilation and link, you have to make sure it works with compilers which do not support it. Even without taking into account the build issues, there is the problem of correctly annotating the source code depending on the context. For example, many interesting places where to use openmp in numpy would need more than just using the "parallel for" pragma. From what I know of openMP, the annotations may depend on the kind of operation you are doing (independent element-wise operations or not). Also, the test case posted before use a really big N, where you are sure that using multi-thread is efficient. What happens if N is small ? Basically, the posted test is the best situation which can happen (big N, known operation with known context, etc...). That's a proof that openMP works, not that it can work for numpy. I find the example of sse rather enlightening: in theory, you should expect a 100-300 % speed increase using sse, but even with pure C code in a controlled manner, on one platform (linux + gcc), with varying, recent CPU, the results are fundamentally different. So what would happen in numpy, where you don't control things that much ? cheers, David _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion