Use directly restrict in C99 mode (__restrict does not have exactly the same semantics).
For a valgrind profil, you can check my blog ( http://matt.eifelle.com/2009/04/07/profiling-with-valgrind/) Basically, if you have a python script, you can valgrind --optionsinmyblog python myscript.py For PAPI, you have to install several packages (perf module for kernel for instance) and a GUI to analyze the results (in Eclispe, it should be possible). Matthieu 2011/2/15 Sebastian Haase <seb.ha...@gmail.com> > Thanks Matthieu, > using __restrict__ with g++ did not change anything. How do I use > valgrind with C extensions? > I don't know what "PAPI profil" is ...? > -Sebastian > > > On Tue, Feb 15, 2011 at 4:54 PM, Matthieu Brucher > <matthieu.bruc...@gmail.com> wrote: > > Hi, > > My first move would be to add a restrict keyword to dist (i.e. dist is > the > > only pointer to the specific memory location), and then declare dist_ > inside > > the first loop also with a restrict. > > Then, I would run valgrind or a PAPI profil on your code to see what > causes > > the issue (false sharing, ...) > > Matthieu > > > > 2011/2/15 Sebastian Haase <seb.ha...@gmail.com> > >> > >> Hi, > >> I assume that someone here could maybe help me, and I'm hoping it's > >> not too much off topic. > >> I have 2 arrays of 2d point coordinates and would like to calculate > >> all pairwise distances as fast as possible. > >> Going from Python/Numpy to a (Swigged) C extension already gave me a > >> 55x speedup. > >> (.9ms vs. 50ms for arrays of length 329 and 340). > >> I'm using gcc on Linux. > >> Now I'm wondering if I could go even faster !? > >> My hope that the compiler might automagically do some SSE2 > >> optimization got disappointed. > >> Since I have a 4 core CPU I thought OpenMP might be an option; > >> I never used that, and after some playing around I managed to get > >> (only) 50% slowdown(!) :-( > >> > >> My code in short is this: > >> (My SWIG typemaps use obj_to_array_no_conversion() from numpy.i) > >> -------<Ccode> ---------- > >> void dists2d( > >> double *a_ps, int nx1, int na, > >> double *b_ps, int nx2, int nb, > >> double *dist, int nx3, int ny3) throw (char*) > >> { > >> if(nx1 != 2) throw (char*) "a must be of shape (n,2)"; > >> if(nx2 != 2) throw (char*) "b must be of shape (n,2)"; > >> if(nx3 != nb || ny3 != na) throw (char*) "c must be of shape > (na,nb)"; > >> > >> double *dist_; > >> int i, j; > >> > >> #pragma omp parallel private(dist_, j, i) > >> { > >> #pragma omp for nowait > >> for(i=0;i<na;i++) > >> { > >> //num_threads=omp_get_num_threads(); --> 4 > >> dist_ = dist+i*nb; // dists_ is only > >> introduced for OpenMP > >> for(j=0;j<nb;j++) > >> { > >> *dist_++ = sqrt( sq(a_ps[i*nx1] - b_ps[j*nx2]) > + > >> > sq(a_ps[i*nx1+1] > >> - b_ps[j*nx2+1]) ); > >> } > >> } > >> } > >> } > >> -------</Ccode> ---------- > >> There is probably a simple mistake in this code - as I said I never > >> used OpenMP before. > >> It should be not too difficult to use OpenMP correctly here > >> or - maybe better - > >> is there a simple SSE(2,3,4) version that might be even better than > >> OpenMP... !? > >> > >> I supposed, that I did not get the #pragma omp lines right - any idea ? > >> Or is it in general not possible to speed this kind of code up using > >> OpenMP !? > >> > >> Thanks, > >> Sebastian Haase > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion@scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > -- > > Information System Engineer, Ph.D. > > Blog: http://matt.eifelle.com > > LinkedIn: http://www.linkedin.com/in/matthieubrucher > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion