Re: syntheticcontroldata clustering example failure due to combiner

Jeff Eastman Thu, 11 Jun 2009 10:33:31 -0700

Good to hear. The current implementation is actually the first one Idid, so it was easy to revert to that model. It does require the mapperto retain all of the canopies; however, and this could create an OOME ifthe T values are poorly chosen. Doing the centroid calculation in thecombiner removed this difficulty but the Hadoop semantics change makesit a non-starter. If there was some globally-unique way to create newcluster identifiers as they are needed, the centroid calculation couldbe moved to the reducer. There would still be a need to combine theclusters created by each of the mappers...


Jeff



Adil Aijaz wrote:

Jeff,
Thanks for the quick turnaround on this issue. Just tested it and thecanopy creation and kmeans both work now on syntheticcontroldata. Iget 7 canopies and 7 clusters. Collection logic in close() is notpretty but can't think of a workaround myself.
adil

Jeff Eastman wrote:
r783617 removed the CanopyCombiner and refactored its semantics backinto the reducer. Updated unit tests pass and Synthetic Control withCanopy produces 6 clusters. Kmeans also runs produces 6 clusters too.I really don't like doing stuff in close() but see no practicalalternative. Ideas are still welcomed.
Jeff


Jeff Eastman wrote:
Adil Aijaz wrote:
2. There is a bug inexamples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/kmeans/Job.javathat called runJob from main function with my provided argumentstransposed. So, my convergenceDelta was interpreted as t1, t1 ast2, and t2 as convergenceDelta. I will commit a patch as soon as Iget approval for opensource commits from my employer, however, Ithought I'd put it out there in case someone else is going throughthe same issue.
r783585 fixed the parameter ordering bug. Still working on theCombiner problem.
Thanks Adil,
Jeff

Re: syntheticcontroldata clustering example failure due to combiner

Reply via email to