[ https://issues.apache.org/jira/browse/MAHOUT-931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176266#comment-13176266 ]
Paritosh Ranjan commented on MAHOUT-931: ---------------------------------------- Ok. Should I proceed like this : Step 1) Encapsulte Cluster specific CLI arguments (ClusterConfig and its cluster specific implementations) Step 2) Implement all Clustering policies Step 3) Implement outlier removal in policies. Step 3a) First cut : use a probability threshold based outlier removal ( as described in previous comment ) Step 3b) Final cut : Use cluster specific arguments for outlier removal. Step 4) Replace Clustering Algorithms with Classifier/Iterator ( for algorithms which can be done using this ) Regarding naming, I would say, that, readability should always be given importance. I consider naming as an important part of software development, either working alone or in a team. I prefer readable code than JavaDocs. The current code is not having ample JavaDocs, so at least naming should be appropriate. I am not pushing for name change, just expressing my thoughts. If you agree upon implementing things in the order (Steps) I mentioned. Then I can start implementing them. If you have any suggestions to improve them, then please suggest. > Implement a pluggable outlier removal capability for cluster classifiers > ------------------------------------------------------------------------ > > Key: MAHOUT-931 > URL: https://issues.apache.org/jira/browse/MAHOUT-931 > Project: Mahout > Issue Type: Improvement > Components: Classification, Clustering > Affects Versions: 0.6 > Reporter: Paritosh Ranjan > Fix For: 0.7 > > Attachments: MAHOUT-931 > > > A pluggable outlier removal capability while classifying the clusters is > needed. The classification and outlier removal implementations, both should > be completely separate entities for better abstraction. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira