On Wed, Jan 18, 2017 at 8:53 AM, <josef.p...@gmail.com> wrote: > > > On Wed, Jan 18, 2017 at 4:52 AM, Nadav Har'El <n...@scylladb.com> wrote: > >> >> On Wed, Jan 18, 2017 at 11:00 AM, aleba...@gmail.com <aleba...@gmail.com> >> wrote: >> >>> Let's look at what the user asked this function, and what it returns: >>> >>>> >>>> User asks: please give me random pairs of the three items, where item 1 >>>> has probability 0.2, item 2 has 0.4, and 3 has 0.4. >>>> >>>> Function returns: random pairs, where if you make many random returned >>>> results (as in the law of large numbers) and look at the items they >>>> contain, item 1 is 0.2333 of the items, item 2 is 0.38333, and item 3 is >>>> 0.38333. >>>> These are not (quite) the probabilities the user asked for... >>>> >>>> Can you explain a sense where the user's requested probabilities (0.2, >>>> 0.4, 0.4) are actually adhered in the results which random.choice returns? >>>> >>> >>> I think that the question the user is asking by specifying p is a >>> slightly different one: >>> "please give me random pairs of the three items extracted from a >>> population of 3 items where item 1 has probability of being extracted of >>> 0.2, item 2 has 0.4, and 3 has 0.4. Also please remove extract items once >>> extracted." >>> >> >> You are right, if that is what the user wants, numpy.random.choice does >> the right thing. >> >> I'm just wondering whether this is actually what users want, and whether >> they understand this is what they are getting. >> >> As I said, I expected it to generate pairs with, empirically, the desired >> distribution of individual items. The documentation of numpy.random.choice >> seemed to me (wrongly) that it implis that that's what it does. So I was >> surprised to realize that it does not. >> > > As Alessandro and you showed, the function returns something that makes > sense. If the user wants something different, then they need to look for a > different function, which is however difficult if it doesn't have a > solution in general. > > Sounds to me a bit like a Monty Hall problem. Whether we like it or not, > or find it counter intuitive, it is what it is given the sampling scheme. > > Having more sampling schemes would be useful, but it's not possible to > implement sampling schemes with impossible properties. >

BTW: sampling 3 out of 3 without replacement is even worse No matter what sampling scheme and what selection probabilities we use, we always have every element with probability 1 in the sample. (Which in survey statistics implies that the sampling error or standard deviation of any estimate of a population mean or total is zero. Which I found weird. How can you do statistics and get an estimate that doesn't have any uncertainty associated with it?) Josef > > Josef > > > >> >> Nadav. >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> https://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion