On Tue, Dec 11, 2018 at 10:39 AM Warren Weckesser < warren.weckes...@gmail.com> wrote:
> There is no bug, just a limitation in the API. > > When I draw without replacement, say, three values from a collection of > length five, the three values that I get are not independent. So really, > this is *one* sample from a three-dimensional (discrete-valued) > distribution. The problem with the current API is that I can't get > multiple samples from this three-dimensional distribution in one call. If > I need to repeat the process six times, I have to use a loop, e.g.: > > >>> samples = [np.random.choice([10, 20, 30, 40, 50], replace=False, > size=3) for _ in range(6)] > > With the `select` function I described in my previous email, which I'll > call `random_select` here, the parameter that determines the number of > items per sample, `nsample`, is separate from the parameter that determines > the number of samples, `size`: > > >>> samples = random_select([10, 20, 30, 40, 50], nsample=3, size=6) > >>> samples > array([[30, 40, 50], > [40, 50, 30], > [10, 20, 40], > [20, 30, 50], > [40, 20, 50], > [20, 10, 30]]) > > (`select` is a really bad name, since `numpy.select` already exists and is > something completely different. I had the longer name `random.select` in > mind when I started using it. "There are only two hard problems..." etc.) > > Warren > This is an issue for the probability distributions from scipy.stats, too. The only library that I know handles this well is TensorFlow Probability, which has a notion of "batch" vs "events" dimensions in distributions. It's actually pretty comprehensive, and makes it easy to express these sorts of operations: >>> import tensorflow_probability as tfp >>> import tensorflow as tf >>> tf.enable_eager_execution() >>> dist = tfp.distributions.Categorical(tf.zeros((3, 5))) >>> dist <tfp.distributions.Categorical 'Categorical/' batch_shape=(3,) event_shape=() dtype=int32> >>> dist.sample(6) <tf.Tensor: id=299, shape=(6, 3), dtype=int32, numpy= array([[1, 2, 1], [2, 1, 3], [4, 4, 2], [0, 1, 1], [0, 2, 2], [2, 0, 4]], dtype=int32)>
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@python.org https://mail.python.org/mailman/listinfo/numpy-discussion