Re: Dirchlet

Grant Ingersoll Thu, 03 Nov 2011 14:12:59 -0700

On Nov 3, 2011, at 4:40 PM, Ted Dunning wrote:

> Sure.
> 
> If we take a normal distribution in one dimension (for simplicity), the
> sufficient statistics are the sample mean, the sample variance and the
> number of observations.  These can be collected by keeping the sum, the sum
> of squares and the count and it is easy to see how to combine model
> estimates by just adding these three numbers together.  Moreover, a prior
> can be expressed as the sufficient statistics of some number of virtual
> observations so producing a single sample model from a single observation x
> consists of just adding x to the prior sum, x^2 to the prior sum of squares
> and 1 to the prior count.  Any real implementation should be a little bit
> fancier since computing variance using the sum and sum of squares is
> numerically really bad.  Something like Welford's method would actually be
> what is used.
> 
> For a multivariate normal distribution, we have some important special
> cases.  These include the symmetric normal (covariance is the identity
> matrix times a constant), the axis aligned normal (covariance is diagonal)
> and the general case.  The symmetric normal can just be a (slightly
> complicateder) extension of the one dimensional case.  The general case
> requires that we keep the vector sum and the matrix sum of the outer
> products of the results analogously to the way that sum and sum of squares
> were kept for the one dimensional case and, again, some numerical
> sophistication is required to make this be a stable on-line algorithm.
> Typically a symmetric normal is used as the prior for the multivariate
> case. (see wikipedia for one simple estimation
> method<http://en.wikipedia.org/wiki/Multivariate_normal_distribution#Estimation_of_parameters>
> that
> can be suitably extended to the on-line case).  I think that there is a
> direct analogy of Welford's method in multiple dimensions.
> 
> Does this help?

So, to deploy a Combiner, we need to understand what type of distribution we 
are dealing with (which we do know, but may need a marker interface or 
something).  Then, if is an "Combinable" distribution, it can do above (which I 
admittedly need to work through a bit more)?  Do we have Welford implemented in 
our math package?

Rough pseudo code would be really helpful as to what the Combiner might look 
like.  I'll worry about the math later.

> 
> 
> On Thu, Nov 3, 2011 at 1:24 PM, Grant Ingersoll <[email protected]> wrote:
> 
>> 
>> On Nov 2, 2011, at 5:31 PM, Ted Dunning wrote:
>>> 
>>>  For some kinds of models, notably all of the ones from the
>>> exponential class, there exist sufficient statistics and the combination
>> of
>>> models really is a lot like addition.  Most of the uses of DP clustering
>>> involve exponential models like the normal distribution so the world is a
>>> happy place.
>> 
>> Can you elaborate on this a bit more in terms of concrete steps we could
>> take to implement this?

--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com

Re: Dirchlet

Reply via email to