So far at least, all our DPC models extend AbstractCluster, so a Wellford 
implementation there would address DPC, fuzzyk and kmeans. I'm curious to see 
what the add() method would look like for the OnlineGaussianAccumulator.

-----Original Message-----
From: Ted Dunning [mailto:[email protected]] 
Sent: Thursday, November 03, 2011 2:24 PM
To: [email protected]
Subject: Re: Dirchlet

On Thu, Nov 3, 2011 at 2:12 PM, Grant Ingersoll <[email protected]> wrote:

> ...
> So, to deploy a Combiner, we need to understand what type of distribution
> we are dealing with (which we do know, but may need a marker interface or
> something).  Then, if is an "Combinable" distribution, it can do above
> (which I admittedly need to work through a bit more)?  Do we have Welford
> implemented in our math package?
>

Yes.  A marker would be required.

The key method would be add(Model).

We do have a Welford implementation for one dimensional estimates
in org.apache.mahout.math.stats.OnlineSummarizer

Rough pseudo code would be really helpful as to what the Combiner might
> look like.  I'll worry about the math later.
>

Here is a sketch:

   public class ModelCombiner extends Reducer<Integer, Model> {
      void reduce(Integer key, Iterable<Model> values, Context ctx) {
          Model m = null;
          for (Model x : values) {
               if (m == null) {
                  m = x;
               else {
                  m.add(x);
               }
          }
          ctx.write(key, m);
       }
    }

Reply via email to