On Wed, Jun 29, 2022 at 7:10 AM Miles Cranmer <miles.cran...@gmail.com> wrote:
> Regarding 2., did you have a particular approach in mind? This new lookup > table method is already O(n) scaling (similar to a counting sort), so I > cannot fathom a method that, as you suggest, would get significantly better > performance for integer arrays. The sorting here is "free" in some sense > since you aren't spending additional cycles, the table is initialized > pre-sorted. > I'm not an expert, just remember having seen talks and implementations using hashing that claim to be a lot faster. https://douglasorr.github.io/2019-09-hash-vs-sort/article.html is quite interesting, and shows it's a complex topic. Did you look for literature on this? I'd imagine that that exists. Your PRs don't have a References section in the docstring; if this is a published method then that would be great to add. Cheers, Ralf > I could see in the future that one could create a similar approach for > arbitrary datatypes, and you would use a hash table rather than a > fixed-size array. In this case you would similarly get O(n) performance, > and would produce an **unsorted** output. But in any case a hash table > would probably be much less efficient for integers than an array as > implemented here - the latter which also gives you a sorted output as a > side effect. Personally I have used np.unique for integer data far more > than any other type, and I would _guess_ it is similar in the broader > community (though I have no data or insight to support that)–so I think > having a big speedup like this could be quite nice for users. > > Thanks, > Miles > _______________________________________________ > NumPy-Discussion mailing list -- numpy-discussion@python.org > To unsubscribe send an email to numpy-discussion-le...@python.org > https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ > Member address: ralf.gomm...@googlemail.com >
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com