[
https://issues.apache.org/jira/browse/MAHOUT-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243282#comment-13243282
]
Jeff Eastman commented on MAHOUT-988:
-------------------------------------
It will be very interesting to compare the performance of the new k-means to
the old version. The ClusterIterator solution does not utilize a combiner like
the old implementation did, but does all the aggregation that the combiner used
to do in the mapper, outputting all the trained clusters once at the end of
mapper execution. This means that each CIMapper will only write k records, one
for each cluster in the prior, and thus the copy-merge step should be very
quick. Since each reducer (if numReducers == k) will only see numMappers input
records, the reduce step should be pretty quick too. At least that's the
expectation...
> Convert K-means buildClusters to use new ClusterIterator
> --------------------------------------------------------
>
> Key: MAHOUT-988
> URL: https://issues.apache.org/jira/browse/MAHOUT-988
> Project: Mahout
> Issue Type: Sub-task
> Components: Clustering
> Affects Versions: 0.6
> Reporter: Jeff Eastman
> Assignee: Paritosh Ranjan
> Fix For: 0.7
>
>
> Refactor the current K-means implementation to use the
> ClusterIterator/Classifier implementation. This will replace the mapper,
> combiner, reducer, clusterer and many unit tests but will not modify the
> other driver APIs, thus retaining compatibility with existing CLI.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira