On Fri, Nov 17, 2023 at 1:54 PM Stefan van der Walt via NumPy-Discussion < numpy-discussion@python.org> wrote:
> Hi all, > > I am trying to sample k N-dimensional vectors from a uniform distribution > without replacement. > It seems like this should be straightforward, but I can't seem to pin it > down. > > Specifically, I am trying to get random indices in an d0 x d1 x d2.. x > dN-1 array. > > I thought about sneaking in a structured dtype into `rng.integers`, but of > course that doesn't work. > > If we had a string sampler, I could sample k unique words (consisting of > digits), and convert them to indices. > > I could over-sample and filter out the non-unique indices. Or iteratively > draw blocks of samples until I've built up my k unique indices. > > The most straightforward solution would be to flatten indices, and to > sample from those. The integers get large quickly, though. The rng.integers > docstring suggests that it can handle object arrays for very large integers: > > > When using broadcasting with uint64 dtypes, the maximum value (2**64) > > cannot be represented as a standard integer type. > > The high array (or low if high is None) must have object dtype, e.g., > array([2**64]). > > But, that doesn't work: > > In [35]: rng.integers(np.array([2**64], dtype=object)) > ValueError: high is out of bounds for int64 > > Is there an elegant way to handle this problem? > The default dtype for the result of `integers()` is the signed `int64`. If you want to sample from the range `[0, 2**64)`, you need to specify `dtype=np.uint64`. The text you are reading is saying that if you want to specify exactly `2**64` as the exclusive upper bound that you won't be able to do it with a `np.uint64` array/scalar because it's one above the bound for that dtype, so you'll have to use a plain Python `int` object or `dtype=object` array in order to represent `2**64`. It is not saying that you can draw arbitrary-sized integers. >>> rng.integers(2**64, dtype=np.uint64) 11569248114014186612 If the arrays you are drawing indices for are real in-memory arrays for present-day 64-bit computers, this should be adequate. If it's a notional array that is larger, then you'll need actual arbitrary-sized integer sampling. The builtin `random.randrange()` will do arbitrary-sized integers and is quite reasonable for this task. If you want it to use our BitGenerators underneath for clean PRNG state management, this is quite doable with a simple subclass of `random.Random`: https://github.com/numpy/numpy/issues/24458#issuecomment-1685022258 -- Robert Kern
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: arch...@mail-archive.com