Re: syntheticcontroldata clustering example failure due to combiner

Jeff Eastman Thu, 11 Jun 2009 10:22:46 -0700

Depending upon the T1 and T2 values you provide, Canopy will produce theinitial "k" cluster centers for the K-Means step. Those then prime theiterations and the result should retain the same number of clusters.


Jeff



Benson Margulies wrote:

So what are you guys doing to get from an unpredictable number of
canopies to a 'k' value for k-means and an initial assignment of each
item to one cluster?


On Thu, Jun 11, 2009 at 12:49 PM, Adil Aijaz<[email protected]> wrote:

Jeff,

Thanks for the quick turnaround on this issue. Just tested it and the canopy
creation and kmeans both work now on syntheticcontroldata. I get 7 canopies
and 7 clusters. Collection logic in close() is not pretty but can't think of
a workaround myself.

adil

Jeff Eastman wrote:

r783617 removed the CanopyCombiner and refactored its semantics back into
the reducer. Updated unit tests pass and Synthetic Control with Canopy
produces 6 clusters. Kmeans also runs produces 6 clusters too. I really
don't like doing stuff in close() but see no practical alternative. Ideas
are still welcomed.

Jeff


Jeff Eastman wrote:

Adil Aijaz wrote:

2. There is a bug in
examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/kmeans/Job.java
that called runJob from main function with my provided arguments transposed.
So, my convergenceDelta was interpreted as t1, t1 as t2, and t2 as
convergenceDelta. I will commit a patch as soon as I get approval for
opensource commits from my employer, however, I thought I'd put it out there
in case someone else is going through the same issue.

r783585 fixed the parameter ordering bug. Still working on the Combiner
problem.

Thanks Adil,
Jeff

Re: syntheticcontroldata clustering example failure due to combiner

Reply via email to