> My first question is: why? What's the attraction in returning a sorted
> answer here? Returning an unsorted array is potentially faster,
> depending on the algorithm chosen, and sorting after the fact is
> trivial. If one was going to spend extra complexity on something, I'd
> think it would be better spent on preserving the input order.
There is a unique function in matlab that returns a sorted vector. I think a lot of people will expect a numpy and matlab functions with identical names to behave similarly.
If we want to preserve the input order, we'd have to choose a convention about whose value's order is retained: do we keep the order of the first value found or the last one ?
Here is the benchmark. Sorry Norbert for not including your code the first time, it turns out that with Alain's suggestion, its the fastest one both for lists and arrays.
x = rand(100000)*100
x = x.astype ('i')
l = list(x)
For array x:
In [166]: timeit unique_alan(x) # with set instead of dict
100 loops, best of 3: 8.8 ms per loop
In [167]: timeit unique_norbert(x)
100 loops, best of 3: 8.8 ms per loop
In [168]: timeit unique_sasha(x)
100 loops, best of 3: 10.8 ms per loop
In [169]: timeit unique(x)
10 loops, best of 3: 50.4 ms per loop
In [170]: timeit unique1d(x)
10 loops, best of 3: 13.2 ms per loop
For list l:
In [196]: timeit unique_norbert(l)
10 loops, best of 3: 29 ms per loop
In [197]: timeit unique_alan(l) # with set instead of dict
10 loops, best of 3: 14.5 ms per loop
In [193]: timeit unique(l)
10 loops, best of 3: 29.6 ms per loop
Note:
In Norbert function, setting sort=False for flattenable objects returns a sorted array anyway. So I'd suggest to remove the sort keyword, sort if the datatype is sortable, and don't sort if its not.
David
from numpy import *
x = rand(100000)*100
x = x.astype('i')
l = list(x)
def unique_sasha(x):
s = sort(x)
r = empty(s.shape, float)
r[:-1] = s[1:]
r[-1] = NaN # Ensures that the last element is always kept, so that unique([1,1,1]) returns [1].
return s[r != s]
def unique_norbert(arr,sort=True):
if hasattr(arr,'flatten'):
tmp = arr.flatten()
tmp.sort()
idx = concatenate(([True],tmp[1:]!=tmp[:-1]))
return tmp[idx]
else: # for compatibility:
set = {}
for item in arr:
set[item] = None
if sort:
return asarray(sorted(set.keys()))
else:
return asarray(set.keys())
def unique_alan(arr,sort=True):
if hasattr(arr,'flatten'):
tmp = arr.flatten()
tmp.sort()
idx = concatenate(([True],tmp[1:]!=tmp[:-1]))
return tmp[idx]
else: #for compatability
items = list(set(arr))
if sort: items.sort()
return asarray(items)
------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________ Numpy-discussion mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/numpy-discussion
