2009/12/2 Keith Goodman <[email protected]>: > On Wed, Dec 2, 2009 at 5:27 PM, Anne Archibald > <[email protected]> wrote: >> 2009/12/2 Keith Goodman <[email protected]>: >>> On Wed, Dec 2, 2009 at 5:09 PM, Neal Becker <[email protected]> wrote: >>>> David Warde-Farley wrote: >>>> >>>>> On 2-Dec-09, at 6:55 PM, Howard Chong wrote: >>>>> >>>>>> def myFindMaxA(myList): >>>>>> """implement finding maximum value with for loop iteration""" >>>>>> maxIndex=0 >>>>>> maxVal=myList[0] >>>>>> for index, item in enumerate(myList): >>>>>> if item[0]>maxVal: >>>>>> maxVal=item[0] >>>>>> maxIndex=index >>>>>> return [maxIndex, maxVal] >>>>>> >>>>>> >>>>>> >>>>>> My question is: how can I make the latter version run faster? I >>>>>> think the >>>>>> answer is that I have to do the iteration in C. >>>>> >>>>> >>>>> def find_biggest_n(myarray, n): >>>>> ind = np.argsort(myarray) >>>>> return ind[-n:], myarray[ind[-n:]] >>>>> >>>>> David >>>> Not bad, although I wonder whether a partial sort could be faster. >>> >>> I'm doing a lot of sorting right now. I only need to sort the lowest >>> 30% of values in a 1d array (about 250k elements), the rest I don't >>> need to sort. How do I do a partial sort? >> >> Algorithmically, if you're doing a quicksort, you just don't sort one >> side of a partition when it's outside the range you want sorted (which >> could even be data-dependent, I suppose, as well as >> number-of-items-dependent). This is useful for sorting out only the >> extreme elements, as well as finding quantiles (and those elements >> above/below them). Unfortunately I'm not aware of any implementation, >> useful though it might be. > > Oh, I thought he meant there was a numpy function for partial sorting. > > What kind of speed up for my problem (sort lower 30% of a 1d array > with 250k elements) could I expect if I paid someone to write it in > cython? Twice as fast? I'm actually doing x.argsort(), if that > matters.
Hard to say exactly, but I'd say at best a factor of two. You can get a rough best-case value by doing an argmin followed by an argsort of the first 30%. In reality things will be a bit worse because you won't immediately be able to discard the whole rest of the array, and because your argsort will be accessing data spread over a larger swath of memory (so worse cache performance). But if this doesn't provide a useful speedup, it's very unlikely partial sorting will be of any use to you. Medians and other quantiles, or partitioning tasks, are where it really shines. Anne _______________________________________________ NumPy-Discussion mailing list [email protected] http://mail.scipy.org/mailman/listinfo/numpy-discussion
