On Sun, Apr 7, 2013 at 5:56 PM, Charles R Harris <[email protected]>wrote:
> > > On Sun, Apr 7, 2013 at 5:23 PM, Tom Aldcroft < > [email protected]> wrote: > >> I'm seeing about a factor of 50 difference in performance between >> sorting a random integer array versus sorting that same array viewed >> as a structured array. Am I doing anything wrong here? >> >> In [2]: x = np.random.randint(10000, size=10000) >> >> In [3]: xarr = x.view(dtype=[('a', np.int)]) >> >> In [4]: timeit np.sort(x) >> 1000 loops, best of 3: 588 us per loop >> >> In [5]: timeit np.sort(xarr) >> 10 loops, best of 3: 29 ms per loop >> >> In [6]: timeit np.sort(xarr, order=('a',)) >> 10 loops, best of 3: 28.9 ms per loop >> >> I was wondering if this slowdown is expected (maybe the comparison is >> dropping back to pure Python or ??). I'm showing a simple example >> here, but in reality I'm working with non-trivial structured arrays >> where I might want to sort on multiple columns. >> >> Does anyone have suggestions for speeding things up, or have a sort >> implementation (perhaps Cython) that has better performance for >> structured arrays? >> > > This is probably due to the comparison function used. For straight > integers the C operator `<` is used, for dtypes the dtype comparison > function is passed as a pointer to the routines. I doubt Cython would make > any difference in this case, but making the dtype comparison routine better > would probably help a lot. For all I know, the dtype gets parsed on every > call to the comparison function. > > Note that even sorting as a byte string is notably faster In [13]: sarr = x.view(dtype='<S8') In [14]: timeit sort(sarr) 1000 loops, best of 3: 1.31 ms per loop Chuck
_______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
