On Wed, Jun 29, 2022 at 7:10 AM Miles Cranmer <miles.cran...@gmail.com>
wrote:

> Regarding 2., did you have a particular approach in mind? This new lookup
> table method is already O(n) scaling (similar to a counting sort), so I
> cannot fathom a method that, as you suggest, would get significantly better
> performance for integer arrays. The sorting here is "free" in some sense
> since you aren't spending additional cycles, the table is initialized
> pre-sorted.
>

I'm not an expert, just remember having seen talks and implementations
using hashing that claim to be a lot faster.
https://douglasorr.github.io/2019-09-hash-vs-sort/article.html  is quite
interesting, and shows it's a complex topic.

Did you look for literature on this? I'd imagine that that exists. Your PRs
don't have a References section in the docstring; if this is a published
method then that would be great to add.

Cheers,
Ralf


> I could see in the future that one could create a similar approach for
> arbitrary datatypes, and you would use a hash table rather than a
> fixed-size array. In this case you would similarly get O(n) performance,
> and would produce an **unsorted** output. But in any case a hash table
> would probably be much less efficient for integers than an array as
> implemented here - the latter which also gives you a sorted output as a
> side effect. Personally I have used np.unique for integer data far more
> than any other type, and I would _guess_ it is similar in the broader
> community (though I have no data or insight to support that)–so I think
> having a big speedup like this could be quite nice for users.
>
> Thanks,
> Miles
> _______________________________________________
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-le...@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: ralf.gomm...@googlemail.com
>
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to