On Fri, Nov 17, 2023 at 4:15 PM Aaron Meurer <asmeu...@gmail.com> wrote:
> On Fri, Nov 17, 2023 at 12:10 PM Robert Kern <robert.k...@gmail.com> > wrote: > > > > If the arrays you are drawing indices for are real in-memory arrays for > present-day 64-bit computers, this should be adequate. If it's a notional > array that is larger, then you'll need actual arbitrary-sized integer > sampling. The builtin `random.randrange()` will do arbitrary-sized integers > and is quite reasonable for this task. If you want it to use our > BitGenerators underneath for clean PRNG state management, this is quite > doable with a simple subclass of `random.Random`: > https://github.com/numpy/numpy/issues/24458#issuecomment-1685022258 > > Wouldn't it be better to just use random.randint to generate the > vectors directly at that point? If the number of possibilities is more > than 2**64, birthday odds of generating the same vector twice are on > the order of 1 in 2**32. And you can always do a unique() rejection > check if you really want to be careful. > Yes, I jumped to correcting the misreading of the docstring rather than solving the root problem. Almost certainly, the most straightforward strategy is to use `rng.integers(0, array_shape)` repeatedly, storing tuples of the resulting indices in a set and rejecting duplicates. You'd have to do the same thing if one used `rng.integers(0, np.prod(array_shape))` or `random.randint(0, np.prod(array_shape))` in the flattened case. `rng.integers()` doesn't sample without replacement in any case. `rng.choice()` does. It also will not support unbounded integers; it uses a signed `int64` to hold the population size, so it is bounded to that. That is _likely_ to be a fine upper bound on the size of the array (if it is a real array in a 64-bit address space). So one could use `rng.choice(np.prod(array_shape), replace=False, size=n_samples)` and unflatten these integers to the index tuples if that suits the array size. It's just doing the same set lookups internally, just somewhat more efficiently. -- Robert Kern
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com