On Wed, Apr 14, 2010 at 4:39 PM, Keith Goodman <kwgood...@gmail.com> wrote:
> On Wed, Apr 14, 2010 at 3:12 PM, Anne Archibald > <peridot.face...@gmail.com> wrote: > > On 14 April 2010 16:56, Keith Goodman <kwgood...@gmail.com> wrote: > >> On Wed, Apr 14, 2010 at 12:39 PM, Nikolaus Rath <nikol...@rath.org> > wrote: > >>> Keith Goodman <kwgood...@gmail.com> writes: > >>>> On Wed, Apr 14, 2010 at 8:49 AM, Keith Goodman <kwgood...@gmail.com> > wrote: > >>>>> On Wed, Apr 14, 2010 at 8:16 AM, Nikolaus Rath <nikol...@rath.org> > wrote: > >>>>>> Hello, > >>>>>> > >>>>>> How do I best find out the indices of the largest x elements in an > >>>>>> array? > >>>>>> > >>>>>> Example: > >>>>>> > >>>>>> a = [ [1,8,2], [2,1,3] ] > >>>>>> magic_function(a, 2) == [ (0,1), (1,2) ] > >>>>>> > >>>>>> Since the largest 2 elements are at positions (0,1) and (1,2). > >>>>> > >>>>> Here's a quick way to rank the data if there are no ties and no NaNs: > >>>> > >>>> ...or if you need the indices in order: > >>>> > >>>>>> shape = (3,2) > >>>>>> x = np.random.rand(*shape) > >>>>>> x > >>>> array([[ 0.52420123, 0.43231286], > >>>> [ 0.97995333, 0.87416228], > >>>> [ 0.71604075, 0.66018382]]) > >>>>>> r = x.reshape(-1).argsort().argsort() > >>> > >>> I don't understand why this works. Why do you call argsort() twice? > >>> Doesn't that give you the indices of the sorted indices? > >> > >> It is confusing. Let's look at an example: > >> > >>>> x = np.random.rand(4) > >>>> x > >> array([ 0.37412289, 0.68248559, 0.12935131, 0.42510212]) > >> > >> If we call argsort once we get the index that will sort x: > >> > >>>> idx = x.argsort() > >>>> idx > >> array([2, 0, 3, 1]) > >>>> x[idx] > >> array([ 0.12935131, 0.37412289, 0.42510212, 0.68248559]) > >> > >> Notice that the first element of idx is 2. That's because element x[2] > >> is the min of x. But that's not what we want. We want the first > >> element to be the rank of the first element of x. So we need to > >> shuffle idx around so that the order aligns with x. How do we do that? > >> We sort it! > > > > Unless I am mistaken, what you are doing here is inverting the > > permutation returned by the first argsort. The second argsort is an n > > log n method, though, and permutations can be inverted in linear time: > > > > ix = np.argsort(X) > > ixinv = np.zeros_like(ix) > > ixinv[ix] = np.arange(len(ix)) > > > > This works because if ix is a permutation and ixinv is its inverse, > > A = B[ix] > > is the same as > > A[ixinv] = B > > This also means that you can often do without the actual inverse by > > moving the indexing operation to the other side of the equal sign. > > (Not in the OP's case, though.) > > That is very nice. And very fast for large arrays: > > >> x = np.random.rand(4) > >> timeit idx = x.argsort().argsort() > 1000000 loops, best of 3: 1.45 us per loop > >> timeit idx = x.argsort(); idxinv = np.zeros_like(idx); idxinv[idx] = > np.arange(len(idx)) > 100000 loops, best of 3: 9.52 us per loop > > >> x = np.random.rand(1000) > >> timeit idx = x.argsort().argsort() > 10000 loops, best of 3: 112 us per loop > >> timeit idx = x.argsort(); idxinv = np.zeros_like(idx); idxinv[idx] = > np.arange(len(idx)) > 10000 loops, best of 3: 82.9 us per loop > > >> x = np.random.rand(100000) > >> timeit idx = x.argsort().argsort() > 10 loops, best of 3: 20.4 ms per loop > >> timeit idx = x.argsort(); idxinv = np.zeros_like(idx); idxinv[idx] = > np.arange(len(idx)) > 100 loops, best of 3: 13.2 ms per loop > > > I'll also add that if the OP needs the top m for 1<m<<n, sorting the > > whole input array is not the most efficient algorithm; there are > > priority-queue-based schemes that are asymptotically more efficient, > > but none exists in numpy. Since numpy's sorting is quite fast, I > > personally would just use the sorting. > > Partial sorting would find a lot of uses in the numpy community (like > in median). > Thinking about it... along with a lot of other things. Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion