On Thu, Nov 3, 2011 at 2:18 PM, Jeff Eastman <[email protected]> wrote:
> AbstractCluster already has the running sum of squares implemented and the > kmeans and fuzzyk combiners count on being able to combine its partial > parameters (see ClusterObservations which are passed to combiner and > reducer). I have an implementation of Wellford in OnlineGaussianAccumulator > which I would love to substitute, but I don't know the math to combine > them. If, as you say, it is "like addition", could you please be more > specific (i.e. suggest a combine(other) method for that OGA?) > That is an interesting idea to actually put that method on the OGA. I have been thinking only in terms of models, but having it there as well wouldn't be bad at all. OGA does the computation of mean and variance on a per coordinate basis. This is the axis aligned case that I mentioned. > > With respect to a Dirichlet combiner, the same mechanism of combining > observations used in kmeans and fuzzyk should work, but perhaps those > combiners should be passing clusters and combining cluster observations > too, rather than just passing the running sums in ClusterObservations? > I think that a combiner based clustering should only be passing clusters. A non-combiner clustering should pass points. A resolutoin for that tension is not obvious to me. > This is something I would really like to clean up for 1.0 > Indeed.
