On Wed, Jan 18, 2017 at 1:58 AM, aleba...@gmail.com <aleba...@gmail.com> wrote:
> > > 2017-01-17 22:13 GMT+01:00 Nadav Har'El <n...@scylladb.com>: > >> >> On Tue, Jan 17, 2017 at 7:18 PM, aleba...@gmail.com <aleba...@gmail.com> >> wrote: >> >>> Hi Nadav, >>> >>> I may be wrong, but I think that the result of the current >>> implementation is actually the expected one. >>> Using you example: probabilities for item 1, 2 and 3 are: 0.2, 0.4 and >>> 0.4 >>> >>> P([1,2]) = P([2] | 1st=[1]) P([1]) + P([1] | 1st=[2]) P([2]) >>> >> >> Yes, this formula does fit well with the actual algorithm in the code. >> But, my question is *why* we want this formula to be correct: >> >> Just a note: this formula is correct and it is one of statistics > fundamental law: https://en.wikipedia.org/wiki/Law_of_total_probability + > https://en.wikipedia.org/wiki/Bayes%27_theorem > Hi, Yes, of course the formula is correct, but it doesn't mean we're not applying it in the wrong context. I'll be honest here: I came to numpy.random.choice after I actually coded a similar algorithm (with the same results) myself, because like you I thought this was the "obvious" and correct algorithm. Only then I realized that its output doesn't actually produce the desired probabilities specified by the user - even in the cases where that is possible. And I started wondering if existing libraries - like numpy - do this differently. And it turns out, numpy does it (basically) in the same way as my algorithm. > > Thus, the result we get from random.choice IMHO definitely makes sense. > Let's look at what the user asked this function, and what it returns: User asks: please give me random pairs of the three items, where item 1 has probability 0.2, item 2 has 0.4, and 3 has 0.4. Function returns: random pairs, where if you make many random returned results (as in the law of large numbers) and look at the items they contain, item 1 is 0.2333 of the items, item 2 is 0.38333, and item 3 is 0.38333. These are not (quite) the probabilities the user asked for... Can you explain a sense where the user's requested probabilities (0.2, 0.4, 0.4) are actually adhered in the results which random.choice returns? Thanks, Nadav Har'El.
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion