Re: [Numpy-discussion] Sort performance with structured array

Charles R Harris Sun, 07 Apr 2013 16:56:29 -0700

On Sun, Apr 7, 2013 at 5:23 PM, Tom Aldcroft
<[email protected]>wrote:


> I'm seeing about a factor of 50 difference in performance between
> sorting a random integer array versus sorting that same array viewed
> as a structured array.  Am I doing anything wrong here?
>
> In [2]: x = np.random.randint(10000, size=10000)
>
> In [3]: xarr = x.view(dtype=[('a', np.int)])
>
> In [4]: timeit np.sort(x)
> 1000 loops, best of 3: 588 us per loop
>
> In [5]: timeit np.sort(xarr)
> 10 loops, best of 3: 29 ms per loop
>
> In [6]: timeit np.sort(xarr, order=('a',))
> 10 loops, best of 3: 28.9 ms per loop
>
> I was wondering if this slowdown is expected (maybe the comparison is
> dropping back to pure Python or ??).  I'm showing a simple example
> here, but in reality I'm working with non-trivial structured arrays
> where I might want to sort on multiple columns.
>
> Does anyone have suggestions for speeding things up, or have a sort
> implementation (perhaps Cython) that has better performance for
> structured arrays?
>

This is probably due to the comparison function used. For straight integers
the C operator `<` is used, for dtypes the dtype comparison function is
passed as a pointer to the routines. I doubt Cython would make any
difference in this case, but making the dtype comparison routine better
would probably help a lot. For all I know, the dtype gets parsed on every
call to the comparison function.

Chuck

_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Sort performance with structured array

Reply via email to