On Wed, Jan 18, 2017 at 4:13 PM Nadav Har'El <n...@scylladb.com> wrote:

> On Wed, Jan 18, 2017 at 4:30 PM, <josef.p...@gmail.com> wrote: > > > > Having more sampling schemes would be useful, but it's not possible to > implement sampling schemes with impossible properties. > > > > BTW: sampling 3 out of 3 without replacement is even worse > > No matter what sampling scheme and what selection probabilities we use, we > always have every element with probability 1 in the sample. > > > I agree. The random-sample function of the type I envisioned will be able > to reproduce the desired probabilities in some cases (like the example I > gave) but not in others. Because doing this correctly involves a set of n > linear equations in comb(n,k) variables, it can have no solution, or many > solutions, depending on the n and k, and the desired probabilities. A > function of this sort could return an error if it can't achieve the desired > probabilities. > It seems to me that the basic problem here is that the numpy.random.choice docstring fails to explain what the function actually does when called with weights and without replacement. Clearly there are different expectations; I think numpy.random.choice chose one that is easy to explain and implement but not necessarily what everyone expects. So the docstring should be clarified. Perhaps a Notes section: When numpy.random.choice is called with replace=False and non-uniform probabilities, the resulting distribution of samples is not obvious. numpy.random.choice effectively follows the procedure: when choosing the kth element in a set, the probability of element i occurring is p[i] divided by the total probability of all not-yet-chosen (and therefore eligible) elements. This approach is always possible as long as the sample size is no larger than the population, but it means that the probability that element i occurs in the sample is not exactly p[i]. Anne >

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion