2009/12/2 Keith Goodman <[email protected]>:
> On Wed, Dec 2, 2009 at 5:27 PM, Anne Archibald
> <[email protected]> wrote:
>> 2009/12/2 Keith Goodman <[email protected]>:
>>> On Wed, Dec 2, 2009 at 5:09 PM, Neal Becker <[email protected]> wrote:
>>>> David Warde-Farley wrote:
>>>>
>>>>> On 2-Dec-09, at 6:55 PM, Howard Chong wrote:
>>>>>
>>>>>> def myFindMaxA(myList):
>>>>>>    """implement finding maximum value with for loop iteration"""
>>>>>>    maxIndex=0
>>>>>>    maxVal=myList[0]
>>>>>>    for index, item in enumerate(myList):
>>>>>>        if item[0]>maxVal:
>>>>>>            maxVal=item[0]
>>>>>>            maxIndex=index
>>>>>>    return [maxIndex, maxVal]
>>>>>>
>>>>>>
>>>>>>
>>>>>> My question is: how can I make the latter version run faster? I
>>>>>> think the
>>>>>> answer is that I have to do the iteration in C.
>>>>>
>>>>>
>>>>> def find_biggest_n(myarray, n):
>>>>> ind = np.argsort(myarray)
>>>>> return ind[-n:], myarray[ind[-n:]]
>>>>>
>>>>> David
>>>> Not bad, although I wonder whether a partial sort could be faster.
>>>
>>> I'm doing a lot of sorting right now. I only need to sort the lowest
>>> 30% of values in a 1d array (about 250k elements), the rest I don't
>>> need to sort. How do I do a partial sort?
>>
>> Algorithmically, if you're doing a quicksort, you just don't sort one
>> side of a partition when it's outside the range you want sorted (which
>> could even be data-dependent, I suppose, as well as
>> number-of-items-dependent). This is useful for sorting out only the
>> extreme elements, as well as finding quantiles (and those elements
>> above/below them). Unfortunately I'm not aware of any implementation,
>> useful though it might be.
>
> Oh, I thought he meant there was a numpy function for partial sorting.
>
> What kind of speed up for my problem (sort lower 30% of a 1d array
> with 250k elements) could I expect if I paid someone to write it in
> cython? Twice as fast? I'm actually doing x.argsort(), if that
> matters.

Hard to say exactly, but I'd say at best a factor of two. You can get
a rough best-case value by doing an argmin followed by an argsort of
the first 30%.

In reality things will be a bit worse because you won't immediately be
able to discard the whole rest of the array, and because your argsort
will be accessing data spread over a larger swath of memory (so worse
cache performance). But if this doesn't provide a useful speedup, it's
very unlikely partial sorting will be of any use to you.

Medians and other quantiles, or partitioning tasks, are where it really shines.

Anne
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to