# Re: [Numpy-discussion] Question about numpy.random.choice with probabilties

```On Wed, Jan 18, 2017 at 8:53 AM, <josef.p...@gmail.com> wrote:

>
>
> On Wed, Jan 18, 2017 at 4:52 AM, Nadav Har'El <n...@scylladb.com> wrote:
>
>>
>> On Wed, Jan 18, 2017 at 11:00 AM, aleba...@gmail.com <aleba...@gmail.com>
>> wrote:
>>
>>> Let's look at what the user asked this function, and what it returns:
>>>
>>>>
>>>> User asks: please give me random pairs of the three items, where item 1
>>>> has probability 0.2, item 2 has 0.4, and 3 has 0.4.
>>>>
>>>> Function returns: random pairs, where if you make many random returned
>>>> results (as in the law of large numbers) and look at the items they
>>>> contain, item 1 is 0.2333 of the items, item 2 is 0.38333, and item 3 is
>>>> 0.38333.
>>>> These are not (quite) the probabilities the user asked for...
>>>>
>>>> Can you explain a sense where the user's requested probabilities (0.2,
>>>> 0.4, 0.4) are actually adhered in the results which random.choice returns?
>>>>
>>>
>>> I think that the question the user is asking by specifying p is a
>>> slightly different one:
>>>      "please give me random pairs of the three items extracted from a
>>> population of 3 items where item 1 has probability of being extracted of
>>> 0.2, item 2 has 0.4, and 3 has 0.4. Also please remove extract items once
>>> extracted."
>>>
>>
>> You are right, if that is what the user wants, numpy.random.choice does
>> the right thing.
>>
>> I'm just wondering whether this is actually what users want, and whether
>> they understand this is what they are getting.
>>
>> As I said, I expected it to generate pairs with, empirically, the desired
>> distribution of individual items. The documentation of numpy.random.choice
>> seemed to me (wrongly) that it implis that that's what it does. So I was
>> surprised to realize that it does not.
>>
>
> As Alessandro and you showed, the function returns something that makes
> sense. If the user wants something different, then they need to look for a
> different function, which is however difficult if it doesn't have a
> solution in general.
>
> Sounds to me a bit like a Monty Hall problem. Whether we like it or not,
> or find it counter intuitive, it is what it is given the sampling scheme.
>
> Having more sampling schemes would be useful, but it's not possible to
> implement sampling schemes with impossible properties.
>```
```
BTW: sampling 3 out of 3 without replacement is even worse

No matter what sampling scheme and what selection probabilities we use, we
always have every element with probability 1 in the sample.

(Which in survey statistics implies that the sampling error or standard
deviation of any estimate of a population mean or total is zero. Which I
found weird. How can you do statistics and get an estimate that doesn't
have any uncertainty associated with it?)

Josef

>
> Josef
>
>
>
>>
>>
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
```
```_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion
```