On Fri, Nov 17, 2023 at 4:15 PM Aaron Meurer <asmeu...@gmail.com> wrote:

> On Fri, Nov 17, 2023 at 12:10 PM Robert Kern <robert.k...@gmail.com>
> wrote:
> >
> > If the arrays you are drawing indices for are real in-memory arrays for
> present-day 64-bit computers, this should be adequate. If it's a notional
> array that is larger, then you'll need actual arbitrary-sized integer
> sampling. The builtin `random.randrange()` will do arbitrary-sized integers
> and is quite reasonable for this task. If you want it to use our
> BitGenerators underneath for clean PRNG state management, this is quite
> doable with a simple subclass of `random.Random`:
> https://github.com/numpy/numpy/issues/24458#issuecomment-1685022258
>
> Wouldn't it be better to just use random.randint to generate the
> vectors directly at that point? If the number of possibilities is more
> than 2**64, birthday odds of generating the same vector twice are on
> the order of 1 in 2**32. And you can always do a unique() rejection
> check if you really want to be careful.
>

Yes, I jumped to correcting the misreading of the docstring rather than
solving the root problem. Almost certainly, the most straightforward
strategy is to use `rng.integers(0, array_shape)` repeatedly, storing
tuples of the resulting indices in a set and rejecting duplicates. You'd
have to do the same thing if one used `rng.integers(0,
np.prod(array_shape))` or `random.randint(0, np.prod(array_shape))` in the
flattened case.

`rng.integers()` doesn't sample without replacement in any case.
`rng.choice()` does. It also will not support unbounded integers; it uses a
signed `int64` to hold the population size, so it is bounded to that. That
is _likely_ to be a fine upper bound on the size of the array (if it is a
real array in a 64-bit address space). So one could use
`rng.choice(np.prod(array_shape), replace=False, size=n_samples)` and
unflatten these integers to the index tuples if that suits the array size.
It's just doing the same set lookups internally, just somewhat more
efficiently.

-- 
Robert Kern
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to