For L_2 centroids, you just have to have the mapper emit a trivial sum and a count (of 1). The combiner should take a list of vector sums and counts and produce a combined sum and count.
Then the reducer will get a sums and counts and it should add them together and divide by the count. (just like n-dimensional word count!) On Thu, Jun 11, 2009 at 9:49 AM, Adil Aijaz <[email protected]> wrote: > Jeff, > > Thanks for the quick turnaround on this issue. Just tested it and the > canopy creation and kmeans both work now on syntheticcontroldata. I get 7 > canopies and 7 clusters. Collection logic in close() is not pretty but can't > think of a workaround myself. > > adil > > > Jeff Eastman wrote: > >> r783617 removed the CanopyCombiner and refactored its semantics back into >> the reducer. Updated unit tests pass and Synthetic Control with Canopy >> produces 6 clusters. Kmeans also runs produces 6 clusters too. I really >> don't like doing stuff in close() but see no practical alternative. Ideas >> are still welcomed. >> >> Jeff >> >> >> Jeff Eastman wrote: >> >>> Adil Aijaz wrote: >>> >>>> 2. There is a bug in >>>> examples/src/main/java/org/apache/mahout/clustering/syntheticcontrol/kmeans/Job.java >>>> that called runJob from main function with my provided arguments >>>> transposed. >>>> So, my convergenceDelta was interpreted as t1, t1 as t2, and t2 as >>>> convergenceDelta. I will commit a patch as soon as I get approval for >>>> opensource commits from my employer, however, I thought I'd put it out >>>> there >>>> in case someone else is going through the same issue. >>>> >>>> r783585 fixed the parameter ordering bug. Still working on the Combiner >>> problem. >>> >>> Thanks Adil, >>> Jeff >>> >>> >>> >> > -- Ted Dunning, CTO DeepDyve 111 West Evelyn Ave. Ste. 202 Sunnyvale, CA 94086 http://www.deepdyve.com 858-414-0013 (m) 408-773-0220 (fax)
