On Mon, Jul 2, 2012 at 9:35 PM, <josef.p...@gmail.com> wrote: > On Mon, Jul 2, 2012 at 8:08 PM, <josef.p...@gmail.com> wrote: > > On Mon, Jul 2, 2012 at 4:16 PM, Fernando Perez <fperez....@gmail.com> > wrote: > >> Hi all, > >> > >> in recent work with a colleague, the need came up for a multivariate > >> hypergeometric sampler; I had a look in the numpy code and saw we have > >> the bivariate version, but not the multivariate one. > >> > >> I had a look at the code in scipy.stats.distributions, and it doesn't > >> look too difficult to add a proper multivariate hypergeometric by > >> extending the bivariate code, with one important caveat: the hard part > >> is the implementation of the actual discrete hypergeometric sampler, > >> which lives inside of numpy/random/mtrand/distributions.c: > >> > >> > https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/distributions.c#L743 > >> > >> That code is hand-written C, and it only works for the bivariate case > >> right now. It doesn't look terribly difficult to extend, but it will > >> certainly take a bit of care and testing to ensure all edge cases are > >> handled correctly. > > > > My only foray into this > > > > http://projects.scipy.org/numpy/ticket/921 > > http://projects.scipy.org/numpy/ticket/923 > > > > This looks difficult to add without a good reference and clear > > description of the algorithm. > > > >> > >> Does anyone happen to have that implemented lying around, in a form > >> that would be easy to merge to add this capability to numpy? > > > > not me, I have never even heard of multivariate hypergeometric > distribution. > > > > > > maybe http://hal.inria.fr/docs/00/11/00/56/PDF/perm.pdf p.11 > > with some properties > http://www.math.uah.edu/stat/urn/MultiHypergeometric.html > > > > I've seen one other algorithm, that seems to need N (number of draws > > in hypergeom) random variables for one multivariate hypergeometric > > random draw, which seems slow to me. > > > > But maybe someone has it lying around. > > Now I have a pure num/sci/python version around. > > A bit more than an hour, so no guarantees, but freq and pmf look close > enough.
I could be wrong, but I think PyMC has sampling and likelihood. Skipper
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion