[ 
https://issues.apache.org/jira/browse/MAHOUT-988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243282#comment-13243282
 ] 

Jeff Eastman commented on MAHOUT-988:
-------------------------------------

It will be very interesting to compare the performance of the new k-means to 
the old version. The ClusterIterator solution does not utilize a combiner like 
the old implementation did, but does all the aggregation that the combiner used 
to do in the mapper, outputting all the trained clusters once at the end of 
mapper execution. This means that each CIMapper will only write k records, one 
for each cluster in the prior, and thus the copy-merge step should be very 
quick. Since each reducer (if numReducers == k) will only see numMappers input 
records, the reduce step should be pretty quick too. At least that's the 
expectation...
                
> Convert K-means buildClusters to use new ClusterIterator
> --------------------------------------------------------
>
>                 Key: MAHOUT-988
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-988
>             Project: Mahout
>          Issue Type: Sub-task
>          Components: Clustering
>    Affects Versions: 0.6
>            Reporter: Jeff Eastman
>            Assignee: Paritosh Ranjan
>             Fix For: 0.7
>
>
> Refactor the current K-means implementation to use the 
> ClusterIterator/Classifier implementation. This will replace the mapper, 
> combiner, reducer, clusterer and many unit tests but will not modify the 
> other driver APIs, thus retaining compatibility with existing CLI.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to