> My first question is: why? What's the attraction in returning a sorted
> answer here? Returning an unsorted array is potentially faster,
> depending on the algorithm chosen, and sorting after the fact is
> trivial. If one was going to spend extra complexity on something, I'd
> think it would be better spent on preserving the input order.
There is a unique function in matlab that returns a sorted vector. I think a lot of people will expect a numpy and matlab functions with identical names to behave similarly.
If we want to preserve the input order, we'd have to choose a convention about whose value's order is retained: do we keep the order of the first value found or the last one ?
Here is the benchmark. Sorry Norbert for not including your code the first time, it turns out that with Alain's suggestion, its the fastest one both for lists and arrays.
x = rand(100000)*100
x = x.astype ('i')
l = list(x)
For array x:
In [166]: timeit unique_alan(x) # with set instead of dict
100 loops, best of 3: 8.8 ms per loop
In [167]: timeit unique_norbert(x)
100 loops, best of 3: 8.8 ms per loop
In [168]: timeit unique_sasha(x)
100 loops, best of 3: 10.8 ms per loop
In [169]: timeit unique(x)
10 loops, best of 3: 50.4 ms per loop
In [170]: timeit unique1d(x)
10 loops, best of 3: 13.2 ms per loop
For list l:
In [196]: timeit unique_norbert(l)
10 loops, best of 3: 29 ms per loop
In [197]: timeit unique_alan(l) # with set instead of dict
10 loops, best of 3: 14.5 ms per loop
In [193]: timeit unique(l)
10 loops, best of 3: 29.6 ms per loop
Note:
In Norbert function, setting sort=False for flattenable objects returns a sorted array anyway. So I'd suggest to remove the sort keyword, sort if the datatype is sortable, and don't sort if its not.
David
from numpy import * x = rand(100000)*100 x = x.astype('i') l = list(x) def unique_sasha(x): s = sort(x) r = empty(s.shape, float) r[:-1] = s[1:] r[-1] = NaN # Ensures that the last element is always kept, so that unique([1,1,1]) returns [1]. return s[r != s] def unique_norbert(arr,sort=True): if hasattr(arr,'flatten'): tmp = arr.flatten() tmp.sort() idx = concatenate(([True],tmp[1:]!=tmp[:-1])) return tmp[idx] else: # for compatibility: set = {} for item in arr: set[item] = None if sort: return asarray(sorted(set.keys())) else: return asarray(set.keys())
def unique_alan(arr,sort=True): if hasattr(arr,'flatten'): tmp = arr.flatten() tmp.sort() idx = concatenate(([True],tmp[1:]!=tmp[:-1])) return tmp[idx] else: #for compatability items = list(set(arr)) if sort: items.sort() return asarray(items)
------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion