[Numpy-discussion] Speeding up isin1d and adding a "method" or similar

Sebastian Berg Thu, 16 Jun 2022 06:15:34 -0700

Hi all,

there is a PR to add a faster path to `np.isin`, that uses a look-up-
table for all the elements that are included in the haystack
(`test_elements`):


    https://github.com/numpy/numpy/pull/12065/files

Such a table means that the memory overhead can be very significant,
but the speedup as well, so there was the idea of adding an option to
pick which version is used.

The current documentation for this new `method` keyword argument would
be.  So the main questions are:

* Is there any concern about adding such a new kwarg?
* Is `method` the best name?  Sorts uses `kind` which may also be good

There is also the smaller question of what heuristic 'auto' would use,
but that can be tweaked at any time.

```
   method : {'auto', 'sort', 'dictionary'}, optional
         The algorithm to use. This will not affect the final result,
         but will affect the speed. Default is 'auto'.

         - If 'sort', will use a mergesort-based approach. This will have
           a memory usage of roughly 6 times the sum of the sizes of
           `ar1` and `ar2`, not accounting for size of dtypes.
         - If 'dictionary', will use a key-dictionary approach similar
           to a counting sort. This is only available for boolean and
           integer arrays. This will have a memory usage of the
           size of `ar1` plus the max-min value of `ar2`. This tends
           to be the faster method if the following formula is true:
           `log10(len(ar2)) > (log10(max(ar2)-min(ar2)) - 2.27) / 0.927`,
           but may use greater memory.
         - If 'auto', will automatically choose the method which is
           expected to perform the fastest, using the above
           formula. For larger sizes or smaller range,
           'dictionary' is chosen. For larger range or smaller
           sizes, 'sort' is chosen.`
```

Cheers,

Sebastian

signature.asc
Description: This is a digitally signed message part

_______________________________________________
NumPy-Discussion mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: [email protected]

[Numpy-discussion] Speeding up isin1d and adding a "method" or similar

Reply via email to