On Wed, Jan 18, 2017 at 1:58 AM, aleba...@gmail.com <aleba...@gmail.com>

> 2017-01-17 22:13 GMT+01:00 Nadav Har'El <n...@scylladb.com>:
>> On Tue, Jan 17, 2017 at 7:18 PM, aleba...@gmail.com <aleba...@gmail.com>
>> wrote:
>>> Hi Nadav,
>>> I may be wrong, but I think that the result of the current
>>> implementation is actually the expected one.
>>> Using you example: probabilities for item 1, 2 and 3 are: 0.2, 0.4 and
>>> 0.4
>>> P([1,2]) = P([2] | 1st=[1]) P([1]) + P([1] | 1st=[2]) P([2])
>> Yes, this formula does fit well with the actual algorithm in the code.
>> But, my question is *why* we want this formula to be correct:
>> Just a note: this formula is correct and it is one of statistics
> fundamental law: https://en.wikipedia.org/wiki/Law_of_total_probability +
> https://en.wikipedia.org/wiki/Bayes%27_theorem


Yes, of course the formula is correct, but it doesn't mean we're not
applying it in the wrong context.

I'll be honest here: I came to numpy.random.choice after I actually coded a
similar algorithm (with the same results) myself, because like you I
thought this was the "obvious" and correct algorithm. Only then I realized
that its output doesn't actually produce the desired probabilities
specified by the user - even in the cases where that is possible. And I
started wondering if existing libraries - like numpy - do this differently.
And it turns out, numpy does it (basically) in the same way as my algorithm.

> Thus, the result we get from random.choice IMHO definitely makes sense.

Let's look at what the user asked this function, and what it returns:

User asks: please give me random pairs of the three items, where item 1 has
probability 0.2, item 2 has 0.4, and 3 has 0.4.

Function returns: random pairs, where if you make many random returned
results (as in the law of large numbers) and look at the items they
contain, item 1 is 0.2333 of the items, item 2 is 0.38333, and item 3 is
These are not (quite) the probabilities the user asked for...

Can you explain a sense where the user's requested probabilities (0.2, 0.4,
0.4) are actually adhered in the results which random.choice returns?

Nadav Har'El.
NumPy-Discussion mailing list

Reply via email to