[Numpy-discussion] OT: performance in C extension; OpenMP, or SSE ?

2011-02-15 Thread Sebastian Haase
Hi, I assume that someone here could maybe help me, and I'm hoping it's not too much off topic. I have 2 arrays of 2d point coordinates and would like to calculate all pairwise distances as fast as possible. Going from Python/Numpy to a (Swigged) C extension already gave me a 55x speedup. (.9ms

Re: [Numpy-discussion] OT: performance in C extension; OpenMP, or SSE ?

2011-02-15 Thread Matthieu Brucher
Hi, My first move would be to add a restrict keyword to dist (i.e. dist is the only pointer to the specific memory location), and then declare dist_ inside the first loop also with a restrict. Then, I would run valgrind or a PAPI profil on your code to see what causes the issue (false sharing,

Re: [Numpy-discussion] OT: performance in C extension; OpenMP, or SSE ?

2011-02-15 Thread Sebastian Haase
Thanks Matthieu, using __restrict__ with g++ did not change anything. How do I use valgrind with C extensions? I don't know what PAPI profil is ...? -Sebastian On Tue, Feb 15, 2011 at 4:54 PM, Matthieu Brucher matthieu.bruc...@gmail.com wrote: Hi, My first move would be to add a restrict

Re: [Numpy-discussion] OT: performance in C extension; OpenMP, or SSE ?

2011-02-15 Thread Matthieu Brucher
Use directly restrict in C99 mode (__restrict does not have exactly the same semantics). For a valgrind profil, you can check my blog ( http://matt.eifelle.com/2009/04/07/profiling-with-valgrind/) Basically, if you have a python script, you can valgrind --optionsinmyblog python myscript.py For

Re: [Numpy-discussion] OT: performance in C extension; OpenMP, or SSE ?

2011-02-15 Thread Wes McKinney
On Tue, Feb 15, 2011 at 11:25 AM, Matthieu Brucher matthieu.bruc...@gmail.com wrote: Use directly restrict in C99 mode (__restrict does not have exactly the same semantics). For a valgrind profil, you can check my blog (http://matt.eifelle.com/2009/04/07/profiling-with-valgrind/) Basically,

Re: [Numpy-discussion] OT: performance in C extension; OpenMP, or SSE ?

2011-02-15 Thread Sebastian Haase
Wes, I think I should have a couple of GPUs. I would be ready for anything ... if you think that I could do some easy(!) CUDA programming here, maybe you could guide me into the right direction... Thanks, Sebastian. On Tue, Feb 15, 2011 at 5:26 PM, Wes McKinney wesmck...@gmail.com wrote: On

[Numpy-discussion] convolving (or correlating) with sliding windows

2011-02-15 Thread Davide Cittaro
Hi all, I have to work with huge numpy.array (i.e. up to 250 M long) and I have to perform either np.correlate or np.convolve between those. The process can only work on big memory machines but it takes ages. I'm writing to get some hint on how to speed up things (at cost of precision,

Re: [Numpy-discussion] OT: performance in C extension; OpenMP, or SSE ?

2011-02-15 Thread Wes McKinney
On Tue, Feb 15, 2011 at 11:33 AM, Sebastian Haase seb.ha...@gmail.com wrote: Wes, I think I should have a couple of GPUs. I would be ready for anything ... if you think that I could do some easy(!) CUDA programming here, maybe you could guide me into the right direction... Thanks, Sebastian.

Re: [Numpy-discussion] convolving (or correlating) with sliding windows

2011-02-15 Thread josef . pktd
On Tue, Feb 15, 2011 at 11:42 AM, Davide Cittaro davide.citt...@ifom-ieo-campus.it wrote: Hi all, I have to work with huge numpy.array (i.e. up to 250 M long) and I have to perform either np.correlate or np.convolve between those. The process can only work on big memory machines but it takes

Re: [Numpy-discussion] convolving (or correlating) with sliding windows

2011-02-15 Thread josef . pktd
On Tue, Feb 15, 2011 at 11:42 AM, Davide Cittaro davide.citt...@ifom-ieo-campus.it wrote: Hi all, I have to work with huge numpy.array (i.e. up to 250 M long) and I have to perform either np.correlate or np.convolve between those. The process can only work on big memory machines but it takes

Re: [Numpy-discussion] convolving (or correlating) with sliding windows

2011-02-15 Thread Jonathan Hilmer
I'm sorry that I don't have some example code for you, but you probably need to break down the problem if you can't fit it into memory: http://en.wikipedia.org/wiki/Overlap-add_method Jonathan On Tue, Feb 15, 2011 at 10:27 AM, josef.p...@gmail.com wrote: On Tue, Feb 15, 2011 at 11:42 AM,

Re: [Numpy-discussion] OT: performance in C extension; OpenMP, or SSE ?

2011-02-15 Thread eat
Hi, On Tue, Feb 15, 2011 at 5:50 PM, Sebastian Haase seb.ha...@gmail.comwrote: Hi, I assume that someone here could maybe help me, and I'm hoping it's not too much off topic. I have 2 arrays of 2d point coordinates and would like to calculate all pairwise distances as fast as possible.

Re: [Numpy-discussion] OT: performance in C extension; OpenMP, or SSE ?

2011-02-15 Thread Sebastian Haase
Hi Eat, I will surely try these routines tomorrow, but I still think that neither scipy function does the complete distance calculation of all possible pairs as done by my C code. For 2 arrays, X and Y, of nX and nY 2d coordinates respectively, I need to get nX times nY distances computed. From

[Numpy-discussion] f2py target file xxx not generated

2011-02-15 Thread Thomas Ingeman-Nielsen
Hi, I'm trying to get started with f2py on a Windows 7 environment using the Python(x,y) v 2.6.5.6 distribution. I'm following the introductory example of the f2py userguide and try to wrap the file FIB1.F using the command: f2py.py -c fib1.f -m fib1 from the windows command line. I get the

Re: [Numpy-discussion] OT: performance in C extension; OpenMP, or SSE ?

2011-02-15 Thread Chris Colbert
The `cdist` function in scipy spatial does what you want, and takes ~ 1ms on my machine. In [1]: import numpy as np In [2]: from scipy.spatial.distance import cdist In [3]: a = np.random.random((340, 2)) In [4]: b = np.random.random((329, 2)) In [5]: c = cdist(a, b) In [6]: c.shape Out[6]:

Re: [Numpy-discussion] OT: performance in C extension; OpenMP, or SSE ?

2011-02-15 Thread Jonathan Taylor
Take a look at a nice project coming out of my department: http://code.google.com/p/cudamat/ Best, Jon. On Tue, Feb 15, 2011 at 11:33 AM, Sebastian Haase seb.ha...@gmail.com wrote: Wes, I think I should have a couple of GPUs. I would be ready for anything ... if you think that I could do

Re: [Numpy-discussion] OT: performance in C extension; OpenMP, or SSE ?

2011-02-15 Thread Eric Carlson
I don't have the slightest idea what I'm doing, but file name - the_lib.c ___ #include stdio.h #include time.h #include omp.h #include math.h void dists2d( double *a_ps, int na, double *b_ps, int nb, double *dist, int num_threads) { int