On Wed, Jun 29, 2022 at 2:59 PM Miles Cranmer <miles.cran...@gmail.com> wrote:
> So, this new method is in fact a hash table as discussed in that blog > post. However, because it assumes integer arrays, we can go even further > than that blog, and simply use `np.arange(ar_min, ar_max + 1)` is the "hash > table". Thus, you don't actually need to use a hashing function at all, you > can just use the integer itself as the hash of an integer, and use fast > array operations. This is why it happens to return a sorted output, even > though it's not technically necessary. The scaling is the same, but the > overall constant in front of O(n) is much lower. > > In other words, returning a sorted array for integers gets you the best > performance since you can use ``hash(integer)=integer`` (+ assume that your > range of integers is relatively small). But for other datatypes like > strings, as discussed on that blog post, ``hash(string)`` might have a > large range over your strings, and it would be inefficient to allocate the > entire thing. > > The closest analogy to this comparison would be counting sorts vs radix > sorts. A radix sort (https://en.wikipedia.org/wiki/Radix_sort) uses a > hash table, whereas a counting sort ( > https://en.wikipedia.org/wiki/Counting_sort) pre-allocates an array of > length equal to the range of integer data. A radix sort has scaling O(n*k), > for k the key size, and a counting sort has scaling O(n+k), for k the range > of your data. However, a counting sort might have a better constant in > front of the scaling, for fixed k, since it avoids having to create the > hash table, and gets to use super efficient array operations. This PR is > analogous to a counting sort, but a radix sort-like method is described on > that blog (and would be useful for generic types). > > Does this help explain things? > That is indeed helpful, thanks Miles! Cheers, Ralf
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com