Well, yes, my GaussianCluster.pdf() is rather lame. It even says so in the TODO. The compute complexity is involved computing dNorm, the term element probability. Reuters vectors had 46k terms, IIRC, and this took a long time to compute. Suggestions for improvements here are welcome!
-----Original Message----- From: Ted Dunning [mailto:[email protected]] Sent: Wednesday, November 02, 2011 2:33 PM To: [email protected] Subject: Re: Dirchlet That just sounds like a poor implementation, actually, rather than a bad model. On Wed, Nov 2, 2011 at 2:29 PM, Jeff Eastman <[email protected]> wrote: > The pdf() calculation over wide topic vectors does a lot of complicated > math for each term pdf and then underflows on the combined pdf() product to > boot.
