On Fri, Nov 17, 2023 at 1:54 PM Stefan van der Walt via NumPy-Discussion <
numpy-discussion@python.org> wrote:

> Hi all,
>
> I am trying to sample k N-dimensional vectors from a uniform distribution
> without replacement.
> It seems like this should be straightforward, but I can't seem to pin it
> down.
>
> Specifically, I am trying to get random indices in an d0 x d1 x d2.. x
> dN-1 array.
>
> I thought about sneaking in a structured dtype into `rng.integers`, but of
> course that doesn't work.
>
> If we had a string sampler, I could sample k unique words (consisting of
> digits), and convert them to indices.
>
> I could over-sample and filter out the non-unique indices. Or iteratively
> draw blocks of samples until I've built up my k unique indices.
>
> The most straightforward solution would be to flatten indices, and to
> sample from those. The integers get large quickly, though. The rng.integers
> docstring suggests that it can handle object arrays for very large integers:
>
> > When using broadcasting with uint64 dtypes, the maximum value (2**64)
> > cannot be represented as a standard integer type.
> > The high array (or low if high is None) must have object dtype, e.g.,
> array([2**64]).
>
> But, that doesn't work:
>
> In [35]: rng.integers(np.array([2**64], dtype=object))
> ValueError: high is out of bounds for int64
>
> Is there an elegant way to handle this problem?
>

The default dtype for the result of `integers()` is the signed `int64`. If
you want to sample from the range `[0, 2**64)`, you need to specify
`dtype=np.uint64`. The text you are reading is saying that if you want to
specify exactly `2**64` as the exclusive upper bound that you won't be able
to do it with a `np.uint64` array/scalar because it's one above the bound
for that dtype, so you'll have to use a plain Python `int` object or
`dtype=object` array in order to represent `2**64`. It is not saying that
you can draw arbitrary-sized integers.

>>> rng.integers(2**64, dtype=np.uint64)
11569248114014186612

If the arrays you are drawing indices for are real in-memory arrays for
present-day 64-bit computers, this should be adequate. If it's a notional
array that is larger, then you'll need actual arbitrary-sized integer
sampling. The builtin `random.randrange()` will do arbitrary-sized integers
and is quite reasonable for this task. If you want it to use our
BitGenerators underneath for clean PRNG state management, this is quite
doable with a simple subclass of `random.Random`:
https://github.com/numpy/numpy/issues/24458#issuecomment-1685022258

-- 
Robert Kern
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

Reply via email to