On Mon, Feb 17, 2014 at 11:32 AM, Julian Taylor < jtaylor.deb...@googlemail.com> wrote:
> On 17.02.2014 15:18, Francesc Alted wrote: > > On 2/17/14, 1:08 AM, josef.p...@gmail.com wrote: > >> On Sun, Feb 16, 2014 at 6:12 PM, Daπid <davidmen...@gmail.com> wrote: > >>> On 16 February 2014 23:43, <josef.p...@gmail.com> wrote: > >>>> What's the fastest argsort for a 1d array with around 28 Million > >>>> elements, roughly uniformly distributed, random order? > >>> > >>> On numpy latest version: > >>> > >>> for kind in ['quicksort', 'mergesort', 'heapsort']: > >>> print kind > >>> %timeit np.sort(data, kind=kind) > >>> %timeit np.argsort(data, kind=kind) > >>> > >>> > >>> quicksort > >>> 1 loops, best of 3: 3.55 s per loop > >>> 1 loops, best of 3: 10.3 s per loop > >>> mergesort > >>> 1 loops, best of 3: 4.84 s per loop > >>> 1 loops, best of 3: 9.49 s per loop > >>> heapsort > >>> 1 loops, best of 3: 12.1 s per loop > >>> 1 loops, best of 3: 39.3 s per loop > >>> > >>> > >>> It looks quicksort is quicker sorting, but mergesort is marginally > faster > >>> sorting args. The diference is slim, but upon repetition, it remains > >>> significant. > >>> > >>> Why is that? Probably part of the reason is what Eelco said, and part > is > >>> that in sort comparison are done accessing the array elements > directly, but > >>> in argsort you have to index the array, introducing some overhead. > >> Thanks, both. > >> > >> I also gain a second with mergesort. > >> > >> matlab would be nicer in my case, it returns both. > >> I still need to use the argsort to index into the array to also get > >> the sorted array. > > > > Many years ago I needed something similar, so I made some functions for > > sorting and argsorting in one single shot. Maybe you want to reuse > > them. Here it is an example of the C implementation: > > > > https://github.com/PyTables/PyTables/blob/develop/src/idx-opt.c#L619 > > > > and here the Cython wrapper for all of them: > > > > > https://github.com/PyTables/PyTables/blob/develop/tables/indexesextension.pyx#L129 > > > > Francesc > > > > that doesn't really make a big difference if the data is randomly > distributed. > the sorting operation is normally much more expensive than latter > applying the indices: > > In [1]: d = np.arange(10000000) > > In [2]: np.random.shuffle(d) > > In [3]: %timeit np.argsort(d) > 1 loops, best of 3: 1.99 s per loop > > In [4]: idx = np.argsort(d) > > In [5]: %timeit d[idx] > 1 loops, best of 3: 213 ms per loop > > > > But if your data is not random it can make a difference as even > quicksort can be a lot faster then. > timsort would be a nice addition to numpy, it performs very well for > partially sorted data. Unfortunately its quite complicated to implement. > Quicksort and shellsort gain speed by having simple inner loops. I have the impression that timsort is optimal when compares and memory access are expensive, but I haven't seen any benchmarks for native types in contiguous memory. Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion