[Numpy-discussion] Re: Speeding up `unique` and adding "kind" parameter

Miles Cranmer Tue, 28 Jun 2022 22:10:27 -0700

Regarding 2., did you have a particular approach in mind? This new lookup table 
method is already O(n) scaling (similar to a counting sort), so I cannot fathom 
a method that, as you suggest, would get significantly better performance for 
integer arrays. The sorting here is "free" in some sense since you aren't 
spending additional cycles, the table is initialized pre-sorted.


I could see in the future that one could create a similar approach for 
arbitrary datatypes, and you would use a hash table rather than a fixed-size 
array. In this case you would similarly get O(n) performance, and would produce 
an **unsorted** output. But in any case a hash table would probably be much 
less efficient for integers than an array as implemented here - the latter 
which also gives you a sorted output as a side effect. Personally I have used 
np.unique for integer data far more than any other type, and I would _guess_ it 
is similar in the broader community (though I have no data or insight to 
support that)–so I think having a big speedup like this could be quite nice for 
users.

Thanks,
Miles
_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: [email protected]

[Numpy-discussion] Re: Speeding up `unique` and adding "kind" parameter

Reply via email to