On Tue, Dec 11, 2018 at 2:27 PM Stephan Hoyer <[email protected]> wrote:
> On Tue, Dec 11, 2018 at 10:39 AM Warren Weckesser < > [email protected]> wrote: > >> There is no bug, just a limitation in the API. >> >> When I draw without replacement, say, three values from a collection of >> length five, the three values that I get are not independent. So really, >> this is *one* sample from a three-dimensional (discrete-valued) >> distribution. The problem with the current API is that I can't get >> multiple samples from this three-dimensional distribution in one call. If >> I need to repeat the process six times, I have to use a loop, e.g.: >> >> >>> samples = [np.random.choice([10, 20, 30, 40, 50], replace=False, >> size=3) for _ in range(6)] >> >> With the `select` function I described in my previous email, which I'll >> call `random_select` here, the parameter that determines the number of >> items per sample, `nsample`, is separate from the parameter that determines >> the number of samples, `size`: >> >> >>> samples = random_select([10, 20, 30, 40, 50], nsample=3, size=6) >> >>> samples >> array([[30, 40, 50], >> [40, 50, 30], >> [10, 20, 40], >> [20, 30, 50], >> [40, 20, 50], >> [20, 10, 30]]) >> >> (`select` is a really bad name, since `numpy.select` already exists and >> is something completely different. I had the longer name `random.select` >> in mind when I started using it. "There are only two hard problems..." etc.) >> >> Warren >> > > This is an issue for the probability distributions from scipy.stats, too. > > The only library that I know handles this well is TensorFlow Probability, > which has a notion of "batch" vs "events" dimensions in distributions. It's > actually pretty comprehensive, and makes it easy to express these sorts of > operations: > > >>> import tensorflow_probability as tfp > >>> import tensorflow as tf > >>> tf.enable_eager_execution() > >>> dist = tfp.distributions.Categorical(tf.zeros((3, 5))) > >>> dist > <tfp.distributions.Categorical 'Categorical/' batch_shape=(3,) > event_shape=() dtype=int32> > >>> dist.sample(6) > <tf.Tensor: id=299, shape=(6, 3), dtype=int32, numpy= array([[1, 2, 1], > [2, 1, 3], [4, 4, 2], [0, 1, 1], [0, 2, 2], [2, 0, 4]], dtype=int32)> > Yes, tensorflow-probability includes broadcasting of the parameters and generating multiple variates in one call, but note that your example is not sampling without replacement. For sampling 3 items without replacement from a population, the *event_shape* (to use tensorflow-probability terminology) would have to be (3,). Warren _______________________________________________ > NumPy-Discussion mailing list > [email protected] > https://mail.python.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list [email protected] https://mail.python.org/mailman/listinfo/numpy-discussion
