Depending upon the T1 and T2 values you provide, Canopy will produce the
initial "k" cluster centers for the K-Means step. Those then prime the
iterations and the result should retain the same number of clusters.
Jeff
Benson Margulies wrote:
So what are you guys doing to get from an unpredictable number of
canopies to a 'k' value for k-means and an initial assignment of each
item to one cluster?
On Thu, Jun 11, 2009 at 12:49 PM, Adil Aijaz<[email protected]> wrote:
Jeff,
Thanks for the quick turnaround on this issue. Just tested it and the canopy
creation and kmeans both work now on syntheticcontroldata. I get 7 canopies
and 7 clusters. Collection logic in close() is not pretty but can't think of
a workaround myself.
adil
Jeff Eastman wrote:
r783617 removed the CanopyCombiner and refactored its semantics back into
the reducer. Updated unit tests pass and Synthetic Control with Canopy
produces 6 clusters. Kmeans also runs produces 6 clusters too. I really
don't like doing stuff in close() but see no practical alternative. Ideas
are still welcomed.
Jeff
Jeff Eastman wrote:
Adil Aijaz wrote:
2. There is a bug in
examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/kmeans/Job.java
that called runJob from main function with my provided arguments transposed.
So, my convergenceDelta was interpreted as t1, t1 as t2, and t2 as
convergenceDelta. I will commit a patch as soon as I get approval for
opensource commits from my employer, however, I thought I'd put it out there
in case someone else is going through the same issue.
r783585 fixed the parameter ordering bug. Still working on the Combiner
problem.
Thanks Adil,
Jeff