Tim Hochberg wrote:
> My first question is: why? What's the attraction in returning a sorted
> answer here? Returning an unsorted array is potentially faster,
> depending on the algorithm chosen,  and sorting after the fact is
> trivial. If one was going to spend extra complexity on something, I'd
> think it would be better spent on preserving the input order.


There is a unique function in matlab that returns a sorted vector. I think a lot of people will expect a numpy and matlab functions with identical names to behave similarly.

If we want to preserve the input order, we'd have to choose a convention about whose value's order is retained: do we keep the order of the first value found or the last one ?

Here is the benchmark. Sorry Norbert for not including your code the first time, it turns out that with Alain's suggestion, its the fastest one both for lists and arrays.


x = rand(100000)*100
x = x.astype ('i')
l = list(x)

For array x:

In [166]: timeit unique_alan(x) # with set instead of dict
100 loops, best of 3: 8.8 ms per loop

In [167]: timeit unique_norbert(x)
100 loops, best of 3: 8.8 ms per loop

In [168]: timeit unique_sasha(x)
100 loops, best of 3: 10.8 ms per loop

In [169]: timeit unique(x)
10 loops, best of 3: 50.4 ms per loop

In [170]: timeit unique1d(x)
10 loops, best of 3: 13.2 ms per loop


For list l:

In [196]: timeit unique_norbert(l)
10 loops, best of 3: 29 ms per loop

In [197]: timeit unique_alan(l)  # with set instead of dict
10 loops, best of 3: 14.5 ms per loop

In [193]: timeit unique(l)
10 loops, best of 3: 29.6 ms per loop


Note:
In Norbert function, setting sort=False for flattenable objects returns a sorted array anyway. So I'd suggest to remove the sort keyword, sort if the datatype is sortable, and don't sort if its not.

David
from numpy import *
x = rand(100000)*100
x = x.astype('i')
l = list(x)

def unique_sasha(x):
     s = sort(x)
     r = empty(s.shape, float)
     r[:-1] = s[1:]
     r[-1] = NaN # Ensures that the last element is always kept, so that unique([1,1,1]) returns [1]. 
     return s[r != s] 
     
def unique_norbert(arr,sort=True):
    if hasattr(arr,'flatten'):
        tmp = arr.flatten()
        tmp.sort()
        idx = concatenate(([True],tmp[1:]!=tmp[:-1]))
        return tmp[idx]
    else: # for compatibility:
        set = {}
        for item in arr:
            set[item] = None
    if sort:
        return asarray(sorted(set.keys()))
    else:
        return asarray(set.keys())


def unique_alan(arr,sort=True):
    if hasattr(arr,'flatten'):
       tmp = arr.flatten()
       tmp.sort()
       idx = concatenate(([True],tmp[1:]!=tmp[:-1]))
       return tmp[idx]
    else: #for compatability
        items = list(set(arr))
        if sort: items.sort()
        return asarray(items)

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/numpy-discussion

Reply via email to