David Cournapeau wrote: > Gnata Xavier wrote: > >> Ok I will try to see what I can do but it is sure that we do need the >> plug-in system first (read "before the threads in the numpy release"). >> During the devel of 1.1, I will try to find some time to understand >> where I should put some pragma into ufunct using a very conservation >> approach. Any people with some OpenMP knowledge are welcome because I'm >> not a OpenMP expert but only an OpenMP user in my C/C++ codes. >> > > Note that the plug-in idea is just my own idea, it is not something > agreed by anyone else. So maybe it won't be done for numpy 1.1, or at > all. It depends on the main maintainers of numpy. > > >> and the results : >> 10000000 80 10.308471 30.007250 >> 1000000 160 1.902563 5.800172 >> 100000 320 0.543008 1.123274 >> 10000 640 0.206823 0.223031 >> 1000 1280 0.088898 0.044268 >> 100 2560 0.150429 0.008880 >> 10 5120 0.289589 0.002084 >> >> ---> On this machine, we should start to use threads *in this testcase* >> iif size>=10000 (a 100*100 image is a very very small one :)) >> > > Maybe openMP can be more clever, but it tends to show that openMP, when > used naively, can *not* decide how many threads to use. That's really > the core problem: again, I don't know much about openMP, but almost any > project using multi-thread/multi-process and not being embarrassingly > parallel has the problem that it makes things much slower for many cases > where thread creation/management and co have a lot of overhead > proportionally to the computation. The problem is to determine the N, > dynamically, or in a way which works well for most cases. OpenMP was > created for HPC, where you have very large data; it is not so obvious to > me that it is adapted to numpy which has to be much more flexible. Being > fast on a given problem is easy; being fast on a whole range, that's > another story: the problem really is to be as fast as before on small > arrays. > > The fact that matlab, while having much more ressources than us, took > years to do it, makes me extremely skeptical on the efficient use of > multi-threading without real benchmarks for numpy. They have a dedicated > team, who developed a JIT for matlab, which "insert" multi-thread code > on the fly (for m files, not when you are in the interpreter), and who > uses multi-thread blas/lapack (which is already available in numpy > depending on the blas/lapack you are using). > > But again, and that's really the only thing I have to say: prove me wrong :) > > David > I can't :) I can't for a simple reason : Quoting IDL documentation :
"There are instances when allowing IDL to use its default thread pool settings can lead to undesired results. In some instances, a multithreaded implementation using the thread pool may actually take longer to complete a given job than a single-threaded implementation." http://idlastro.gsfc.nasa.gov/idl_html_help/The_IDL_Thread_Pool.html "To prevent the use of the thread pool for computations that involve too few data elements, IDL supports a minimum threshold value for thread pool computations. The minimum threshold value is contained in the TPOOL_MIN_ELTS field of the !CPU system variable. See the following sections for details on modifying this value." At work, I can see people switching from IDL to numpy/scipy/pylab. They are very happy with numpy but they would to find this "thread pool capability" in numpy. All these guys come from C (or from fortran), often from C/fortran MPI or OpenMP. They know which part of a code should be thread and which part should not. As a result, they are very happy with the IDL thread pool. I'm just thinking how to translate that into numpy. Now I have to have a close look at the ufuncs code and to figure out how to add -fopenmp. From a very pragmatic point of view : What is the best/simplest way to use inline C or whatever to do that : "I have a large array A and, at some points of my nice numpy code, I would like to compute let say the threaded sum or the sine of this array? Assuming that I know how to write it in C/OpenMP code." (The background is "I really know that in my case it is much faster... and I asked my boss for a multi-core machine ;)"). Cheers, Xavier _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion