Re: [Numpy-discussion] categorical distributions

David Warde-Farley Mon, 22 Nov 2010 00:17:10 -0800

On 2010-11-22, at 2:51 AM, Hagen Fürstenau wrote:

> but this is bound to be inefficient as soon as the vector of
> probabilities gets large, especially if you want to draw multiple samples.
> 
> Have I overlooked something or should this be added?


I think you misunderstand the point of multinomial distributions. A sample from 
a multinomial is simply a sample from n i.i.d. categoricals, reported as the 
counts for each category in the N observations. It's very easy to recover the 
'categorical' samples from a 'multinomial' sample.

import numpy as np
a = np.random.multinomial(50, [.3, .3, .4])
b = np.zeros(50, dtype=int)
upper = np.cumsum(a); lower = upper - a

for value in range(len(a)):
        b[lower[value]:upper[value]] = value
# mix up the order, in-place, if you care about them not being sorted
np.random.shuffle(b)

then b is a sample from the corresponding 'categorical' distribution.

David
_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] categorical distributions

Reply via email to