Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Gael Varoquaux
On Thu, Feb 12, 2009 at 12:42:37AM -0600, Robert Kern wrote: It is implemented using threads, with Windows native threads on Windows. I think Gaël really just meant threads there. I guess so :). Once you reformulate my remark in proper terms, this is indeed what comes out. I guess all what it

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Francesc Alted
Hi Brian, A Thursday 12 February 2009, Brian Granger escrigué: Hi, This is relevant for anyone who would like to speed up array based codes using threads. I have a simple loop that I have implemented using Cython: def backstep(np.ndarray opti, np.ndarray optf, int istart,

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Gregor Thalhammer
Brian Granger schrieb: I am curious: would you know what would be different in numpy's case compared to matlab array model concerning locks ? Matlab, up to recently, only spreads BLAS/LAPACK on multi-cores, but since matlab 7.3 (or 7.4), it also uses multicore for mathematical functions (cos,

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Francesc Alted
A Thursday 12 February 2009, Dag Sverre Seljebotn escrigué: A quick digression: It would be interesting to see how a spec would look for integrating OpenMP natively into Cython for these kinds of purposes. Cython is still flexible as a language after all. That would be really nice indeed.

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Dag Sverre Seljebotn
Francesc Alted wrote: A Thursday 12 February 2009, Dag Sverre Seljebotn escrigué: A quick digression: It would be interesting to see how a spec would look for integrating OpenMP natively into Cython for these kinds of purposes. Cython is still flexible as a language after all.

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread David Cournapeau
Gregor Thalhammer wrote: Recent Matlab versions use Intels Math Kernel Library, which performs automatic multi-threading - also for mathematical functions like sin etc, but not for addition, multiplication etc. It does if you have access to the parallel toolbox I mentioned earlier in this

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Sturla Molden
On 2/12/2009 7:15 AM, David Cournapeau wrote: Since openmp also exists on windows, I doubt that it is required that openmp uses pthread :) On Windows, MSVC uses Win32 threads and GCC (Cygwin and MinGW) uses pthreads. If you use OpenMP with MinGW, the executable becomes dependent on

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Sturla Molden
On 2/12/2009 11:30 AM, Dag Sverre Seljebotn wrote: It would be interesting to see how a spec would look for integrating OpenMP natively into Cython for these kinds of purposes. Cython is still flexible as a language after all. Avoiding language bloat is also important, but it is difficult to

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Francesc Alted
A Thursday 12 February 2009, Dag Sverre Seljebotn escrigué: FYI, I am one of the core Cython developers and can make such modifications in Cython itself as long as there's consensus on how it should look on the Cython mailing list. My problem is that I don't really know OpenMP and have little

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Francesc Alted
A Thursday 12 February 2009, Sturla Molden escrigué: OpenMP does not need to be a aprt of the Cython language. It can be special comments in the code as in Fortran. After all, #pragma omp parallel is a comment in Cython. Hey! That's very nice to know. We already have OpenMP support in

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Sturla Molden
On 2/12/2009 12:20 PM, David Cournapeau wrote: It does if you have access to the parallel toolbox I mentioned earlier in this thread (again, no experience with it, but I think it is specially popular on clusters; in that case, though, it is not limited to thread-based implementation). As has

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread David Cournapeau
Sturla Molden wrote: On 2/12/2009 12:20 PM, David Cournapeau wrote: It does if you have access to the parallel toolbox I mentioned earlier in this thread (again, no experience with it, but I think it is specially popular on clusters; in that case, though, it is not limited to

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread David Cournapeau
Francesc Alted wrote: I don't know OpenMP enough neither, but I'd say that in this list there could be some people that could help. At any rate, I really like the OpenMP approach and prefer to have support for it in Cython much better than threading, MPI or whatever. But the thing is: is

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Michael Abshoff
Sturla Molden wrote: On 2/12/2009 12:20 PM, David Cournapeau wrote: Hi, It does if you have access to the parallel toolbox I mentioned earlier in this thread (again, no experience with it, but I think it is specially popular on clusters; in that case, though, it is not limited to

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Dag Sverre Seljebotn
Sturla Molden wrote: On 2/12/2009 1:50 PM, Francesc Alted wrote: Hey! That's very nice to know. We already have OpenMP support in Cython for free (or apparently it seems so :-) Not we don't, as variable names are different in C and Cython. But adding support for OpenMP would

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Matthieu Brucher
I am curious: would you know what would be different in numpy's case compared to matlab array model concerning locks ? Matlab, up to recently, only spreads BLAS/LAPACK on multi-cores, but since matlab 7.3 (or 7.4), it also uses multicore for mathematical functions (cos, etc...). So at least

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Matthieu Brucher
Yes, it is. You have to link against pthread (at least with Linux ;)) You have to write a single parallel region if you don't want this overhead (which is not possible with Python). Matthieu 2009/2/12 Gael Varoquaux gael.varoqu...@normalesup.org: On Wed, Feb 11, 2009 at 11:52:40PM -0600,

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Matthieu Brucher
2009/2/12 Sturla Molden stu...@molden.no: On 2/12/2009 1:50 PM, Francesc Alted wrote: Hey! That's very nice to know. We already have OpenMP support in Cython for free (or apparently it seems so :-) Not we don't, as variable names are different in C and Cython. But adding support for

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread David Cournapeau
Matthieu Brucher wrote: No - I have never seen deep explanation of the matlab model. The C api is so small that it is hard to deduce anything from it (except that the memory handling is not ref-counting-based, I don't know if it matters for our discussion of speeding up ufunc). I would guess

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Sturla Molden
On 2/12/2009 12:34 PM, Dag Sverre Seljebotn wrote: FYI, I am one of the core Cython developers and can make such modifications in Cython itself as long as there's consensus on how it should look on the Cython mailing list. My problem is that I don't really know OpenMP and have little

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread David Cournapeau
Matthieu Brucher wrote: Sorry, I was refering to my last mail, but I sent so many in 5 minuts ;) In C, if you have to arrays (two pointers), the compiler can't make aggressive optimizations because they may intersect. With Fortran, this is not possible. In this matter, Numpy behaves like C

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Sturla Molden
On 2/12/2009 1:44 PM, Sturla Molden wrote: Here is an example of SciPy's ckdtree.pyx modified to use OpenMP. It seems I managed to post an errorneous C file. :( S.M. /* * Parallel query for faster kd-tree searches on SMP computers. * This function will

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Michael Abshoff
David Cournapeau wrote: Matthieu Brucher wrote: For BLAS level 3, the MKL is parallelized (so matrix multiplication is). Hi David, Same for ATLAS: thread support is one focus in the 3.9 serie, currently in development. ATLAS has had thread support for a long, long time. The 3.9 series

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Matthieu Brucher
2009/2/12 David Cournapeau da...@ar.media.kyoto-u.ac.jp: Matthieu Brucher wrote: No - I have never seen deep explanation of the matlab model. The C api is so small that it is hard to deduce anything from it (except that the memory handling is not ref-counting-based, I don't know if it matters

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Matthieu Brucher
2009/2/12 David Cournapeau da...@ar.media.kyoto-u.ac.jp: Matthieu Brucher wrote: Sorry, I was refering to my last mail, but I sent so many in 5 minuts ;) In C, if you have to arrays (two pointers), the compiler can't make aggressive optimizations because they may intersect. With Fortran,

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Gael Varoquaux
On Thu, Feb 12, 2009 at 03:27:51PM +0100, Sturla Molden wrote: The question is: Should OpenMP be comments in the Cython code (as they are in C and Fortran), or should OpenMP be special objects? My two cents: go for cython objects/statements. Not only does code in comments looks weird and a

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Sturla Molden
On 2/12/2009 5:24 PM, Gael Varoquaux wrote: My two cents: go for cython objects/statements. Not only does code in comments looks weird and a hack, but also it means to you have to hack the parser. I agree with this. Particularly because Cython uses intendation as syntax. With comments you

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Dag Sverre Seljebotn
Sturla Molden wrote: On 2/12/2009 12:34 PM, Dag Sverre Seljebotn wrote: FYI, I am one of the core Cython developers and can make such modifications in Cython itself as long as there's consensus on how it should look on the Cython mailing list. My problem is that I don't really know

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Dag Sverre Seljebotn
Dag Sverre Seljebotn wrote: Hmm... yes. Care would need to be taken though because Cython might in the future very well generate a while loop instead for such a statement under some circumstances, and that won't work with OpenMP. One should be careful with assuming what the C result will be

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Brian Granger
If your problem is evaluating vector expressions just like the above (i.e. without using transcendental functions like sin, exp, etc...), usually the bottleneck is on memory access, so using several threads is simply not going to help you achieving better performance, but rather the contrary

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Brian Granger
Recent Matlab versions use Intels Math Kernel Library, which performs automatic multi-threading - also for mathematical functions like sin etc, but not for addition, multiplication etc. It seems to me Matlab itself does not take care of multi-threading. On

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Brian Granger
At any rate, I really like the OpenMP approach and prefer to have support for it in Cython much better than threading, MPI or whatever. But the thing is: is OpenMP stable, mature enough for allow using it in most of common platforms? I think that recent GCC compilers support the latest

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Brian Granger
Wow, interesting thread. Thanks everyone for the ideas. A few more comments: GPUs/CUDA: * Even though there is a bottleneck between main memory and GPU memory, as Nathan mentioned, the much larger memory bandwidth on a GPU often makes GPUs great for memory bound computations...as long as you

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Dag Sverre Seljebotn
Brian Granger wrote: And a question: With the new Numpy support in Cython, does Cython release the GIL if it can when running through through loops over numpy arrays? Does Cython call into the C API during these sections? You know, I thought of the exact same thing when reading your post.

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Brian Granger
You know, I thought of the exact same thing when reading your post. No, you need the GIL currently, but that's something I'd like to fix. Ideally, it would be something like this: cdef int i, s = 0, n = ... cdef np.ndarray[int] arr = ... # will require the GIL with nogil: for i in

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-12 Thread Sturla Molden
Sturla Molden wrote: IMO there's a problem with using literal variable names here, because Python syntax implies that the value is passed. One shouldn't make syntax where private=(i,) is legal but private=(f(),) isn't. The latter would be illegal in OpenMP as well. OpenMP pragmas only take

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-11 Thread Robert Kern
On Wed, Feb 11, 2009 at 23:46, Brian Granger ellisonbg@gmail.com wrote: Hi, This is relevant for anyone who would like to speed up array based codes using threads. I have a simple loop that I have implemented using Cython: def backstep(np.ndarray opti, np.ndarray optf, int

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-11 Thread Brian Granger
Eric Jones tried to do this with pthreads in C some time ago. His work is here: http://svn.scipy.org/svn/numpy/branches/multicore/ The lock overhead makes it usually not worthwhile. I was under the impression that Eric's implementation didn't use a thread pool. Thus I thought the

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-11 Thread Robert Kern
On Thu, Feb 12, 2009 at 00:03, Brian Granger ellisonbg@gmail.com wrote: Eric Jones tried to do this with pthreads in C some time ago. His work is here: http://svn.scipy.org/svn/numpy/branches/multicore/ The lock overhead makes it usually not worthwhile. I was under the impression

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-11 Thread Gael Varoquaux
On Wed, Feb 11, 2009 at 11:52:40PM -0600, Robert Kern wrote: This seem like pretty heavy solutions though. From a programmer's perspective, it seems to me like OpenMP is a muck lighter weight solution than pthreads. From a programmer's perspective, because, IMHO, openmp is implemented using

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-11 Thread David Cournapeau
Robert Kern wrote: Eric Jones tried to do this with pthreads in C some time ago. His work is here: http://svn.scipy.org/svn/numpy/branches/multicore/ The lock overhead makes it usually not worthwhile. I am curious: would you know what would be different in numpy's case compared to

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-11 Thread David Cournapeau
Gael Varoquaux wrote: From a programmer's perspective, because, IMHO, openmp is implemented using pthreads. Since openmp also exists on windows, I doubt that it is required that openmp uses pthread :) On linux, with gcc, using -fopenmp implies -pthread, so I guess it uses pthread (can you be

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-11 Thread Brian Granger
I am curious: would you know what would be different in numpy's case compared to matlab array model concerning locks ? Matlab, up to recently, only spreads BLAS/LAPACK on multi-cores, but since matlab 7.3 (or 7.4), it also uses multicore for mathematical functions (cos, etc...). So at least

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-11 Thread Brian Granger
Good point. Is it possible to tell what array size it switches over to using multiple threads? Yes. http://svn.scipy.org/svn/numpy/branches/multicore/numpy/core/threadapi.py Sorry, I was curious about what Matlab does in this respect. But, this is very useful and I will look at it.

Re: [Numpy-discussion] Fast threading solution thoughts

2009-02-11 Thread David Cournapeau
Brian Granger wrote: I am curious: would you know what would be different in numpy's case compared to matlab array model concerning locks ? Matlab, up to recently, only spreads BLAS/LAPACK on multi-cores, but since matlab 7.3 (or 7.4), it also uses multicore for mathematical functions (cos,