[ https://issues.apache.org/jira/browse/MAHOUT-929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217920#comment-13217920 ]
Paritosh Ranjan commented on MAHOUT-929: ---------------------------------------- Adding few test cases in ClusterClassificationDriver will help you understand its funtionality, which will help in clustering refactorings. Adding/skipping mapper test is your wish. Just reiterating, once you understand ClusterClassificationDriver, you can try to use it in KMeansDriver. ClusterClassificationDriver will replace the clusterData phase of KMeansDriver. Feel free to ask questions on MAHOUT-981 regarding KMeansDriver refactoring. > Refactor Clustering (Vector Classification) into a Separate Postprocess with > Outlier Pruning > -------------------------------------------------------------------------------------------- > > Key: MAHOUT-929 > URL: https://issues.apache.org/jira/browse/MAHOUT-929 > Project: Mahout > Issue Type: Improvement > Components: Classification, Clustering > Affects Versions: 0.6 > Reporter: Jeff Eastman > Assignee: Paritosh Ranjan > Fix For: 0.7 > > Attachments: Mahout-929, Mahout-929, Mahout-929, Mahout-929 > > > The current clustering drivers have a -cp option to produce clusteredPoints > directory containing the input vectors classified by the final clusters > produced by the algorithm. These options are redundantly implemented in those > drivers. > - Factor out & implement an independent post processor to perform the > classification step independently of the various clustering implementations. > - Implement a pluggable outlier removal capability for this classifier. > - Consider building off of the ClusterClassifier & ClusterIterator ideas. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira