On Wed, Jun 29, 2022 at 2:59 PM Miles Cranmer <miles.cran...@gmail.com>
wrote:

> So, this new method is in fact a hash table as discussed in that blog
> post. However, because it assumes integer arrays, we can go even further
> than that blog, and simply use `np.arange(ar_min, ar_max + 1)` is the "hash
> table". Thus, you don't actually need to use a hashing function at all, you
> can just use the integer itself as the hash of an integer, and use fast
> array operations. This is why it happens to return a sorted output, even
> though it's not technically necessary. The scaling is the same, but the
> overall constant in front of O(n) is much lower.
>
> In other words, returning a sorted array for integers gets you the best
> performance since you can use ``hash(integer)=integer`` (+ assume that your
> range of integers is relatively small). But for other datatypes like
> strings, as discussed on that blog post, ``hash(string)`` might have a
> large range over your strings, and it would be inefficient to allocate the
> entire thing.
>
> The closest analogy to this comparison would be counting sorts vs radix
> sorts. A radix sort (https://en.wikipedia.org/wiki/Radix_sort) uses a
> hash table, whereas a counting sort (
> https://en.wikipedia.org/wiki/Counting_sort) pre-allocates an array of
> length equal to the range of integer data. A radix sort has scaling O(n*k),
> for k the key size, and a counting sort has scaling O(n+k), for k the range
> of your data. However, a counting sort might have a better constant in
> front of the scaling, for fixed k, since it avoids having to create the
> hash table, and gets to use super efficient array operations. This PR is
> analogous to a counting sort, but a radix sort-like method is described on
> that blog (and would be useful for generic types).
>
> Does this help explain things?
>

That is indeed helpful, thanks Miles!

Cheers,
Ralf
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to