[
https://issues.apache.org/jira/browse/MAHOUT-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dan Filimon resolved MAHOUT-1181.
---------------------------------
Resolution: Fixed
Fix Version/s: 0.8
Committed revision 1482907.
> Adding StreamingKMeans MapReduce classes
> ----------------------------------------
>
> Key: MAHOUT-1181
> URL: https://issues.apache.org/jira/browse/MAHOUT-1181
> Project: Mahout
> Issue Type: New Feature
> Components: Clustering
> Affects Versions: 0.8
> Reporter: Dan Filimon
> Fix For: 0.8
>
> Attachments: MAHOUT_1181.patch, MAHOUT_1181_props.patch,
> MAHOUT_1181_test.patch
>
>
> This patch implements the MapReduce version of StreamingKMeans for
> MAHOUT-1154.
> It adds 5 new classes:
> - CentroidWritable: class representing a centroid that can be written to a
> SeqFile
> - StreamingKMeansDriver: class implementing AbstractJob that is the entry
> point to the mapreduction
> - StreamingKMeansMapper: mapper, running StreamingKMeans (see MAHOUT-1162)
> clustering the points one by one
> - StreamingKMeansReducer: reducer, running BallKMeans (see MAHOUT-1162) a
> number of times and picking the clustering with the lowest total clustering
> cost.
> The cost is determined by randomly splitting the incoming centroids into a
> "training" and "test" set, computing the centroids on the training set and
> the cost on the test set. The intent is to see whether the centroids actually
> describe the distribution of the points or not.
> - StreamingKMeansUtilMR: helper class with a method to instantiate a searcher
> from a Configuration.
> Additionally, there is a test class StreamingKMeansTestMR that tests the
> mapper, reducer and mapper and reducer together using MRUnit.
> !!!
> Since MRUnit is now a dependency, the core pom.xml file adds MRUnit as a
> dependency. We depend on snapshot 1.0 which is not yet released (it will be
> very soon), hence the updated pom.xml is not provided for now.
> !!!
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira