Hi, I'm not sure if this is of much interest but it's been really puzzling me so I thought I'd ask.
In an earlier post I described how I was surprised a simple f2py wrapped fortran bincount was 4x faster than np.bincount - but that differential only seemed to be on my mac; on moving to linux they both took more or less the same time. I'm trying to work out if it is worth moving some of my bottlenecks to fortran (most of which are np builtins). So far it looks like it is - but only on my mac and only 32bit (see below). Well the only explanation I thought was that the gcc-4.0 used to build numpy on a mac didn't perform so well, so after upgrading to snow leopard I've been trying to look at this again. I was hoping I could get the equivalent performance on my mac, like on linux, which would result in the np c stuff being a couple of times faster. So far, with Python 2.6.3 in 64 bit - numpy seems to be significantly slower and my fortran code _much_ slower - even from the same compiler. Can anyone help me understand what is going on? I have only been able to build 32 bit numpy against 2.5.4 with apple gcc-4.0 and 64 bit numpy against 2.6.3 universal with gcc-4.2. I haven't been able to get a numpy I can import on 2.6.3 in 32 bit mode ( http://projects.scipy.org/numpy/ticket/1221 ). Here are the results for python.org 32 bit 2.5.4, numpy compiled with apple gcc 4.0, f2py using att gfortran 4.2: In [2]: timeit x = np.random.random_integers(0,1023,100000000).astype(int) 1 loops, best of 3: 2.86 s per loop In [3]: x = np.random.random_integers(0,1023,100000000).astype(int) In [4]: timeit np.bincount(x) 1 loops, best of 3: 435 ms per loop In [6]: timeit gf42.bincount(x,1024) 10 loops, best of 3: 129 ms per loop In [7]: np.__version__ Out[7]: '1.4.0.dev7618' And for self-built (apple gcc 4.2) 64 bit 2.6.3, numpy compiled with apple gcc 4.2, f2py using the same att gfortran 4.2: In [3]: timeit x = np.random.random_integers(0,1023,100000000).astype(int) 1 loops, best of 3: 3.91 s per loop # 37% slower than 32bit In [4]: x = np.random.random_integers(0,1023,100000000).astype(int) In [5]: timeit np.bincount(x) 1 loops, best of 3: 582 ms per loop # 34 % slower than 32 bit In [8]: timeit gf42_64.bincount(x,1024) 1 loops, best of 3: 803 ms per loop # 522% slower than 32 bit So why is there this big difference in performance? I'd really like to know why the fortran compiled with the same compiler is so much slower in 64 bit mode. As far as I can tell the flags used are the same. Also why is numpy slower. I was surprised the I was able to import the 64 bit universal module built with f2py from 2.6 inside 32 bit 3.5 and there it was quick again - so it seems the x64_64 code generated by the fortran compiler is much slower (rather than any wrappers or such). I tried using some more recent gfortrans from macports - but could only use them to build modules against the 64 bit python/numpy since I couldn't find a way to get f2py to force 32 bit output. But the performance was more or less the same (always several times slower the 32 bit att gfortran). Any advice appreciated. Cheers Robin -------- subroutine bincount (x,c,n,m) implicit none integer, intent(in) :: n,m integer, dimension(0:n-1), intent(in) :: x integer, dimension(0:m-1), intent(out) :: c integer :: i c = 0 do i = 0, n-1 c(x(i)) = c(x(i)) + 1 end do end _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
