[
https://issues.apache.org/jira/browse/MAHOUT-825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171152#comment-13171152
]
Jeff Eastman commented on MAHOUT-825:
-------------------------------------
Scary we've reversed positions :). But I tend to agree that adding outlier
removal in a pluggable manner is the best long term solution. Actually, I think
it is possible to factor all of the various clustering steps (classification of
points) into an independent job which accepts pointsIn, clustersIn and some
other args to do the classification for all the clustering algorithms (they are
really quite redundant in their current incarnations). It might even use the
ClusterClassifier.classify() method which would help with
classification/clustering convergence.
But this should be for 0.7, not 0.6.
> Canopies grouping records outside t1
> ------------------------------------
>
> Key: MAHOUT-825
> URL: https://issues.apache.org/jira/browse/MAHOUT-825
> Project: Mahout
> Issue Type: Improvement
> Components: Clustering
> Affects Versions: 0.6
> Environment: windows, linux
> Reporter: Paritosh Ranjan
> Assignee: Jeff Eastman
> Labels: features, newbie, patch
> Fix For: 0.6
>
> Attachments: Clustering Remote Points - Two Big, Useless
> Clusters.txt, MAHOUT-825.patch, Not Clustering Remote Points - Two Meaningful
> Clusters.txt, canopy-clusterFilter-t1, canopy-outlier-elimination,
> canopy-outside-t1-points-patch-1, canopy-radius-based-outlier-elimination,
> canopy-strict-clustering-flag
>
>
> While finding closest canopy, there is no check to ensure that it returns
> canopies which are within distance t1 from the point. This results in
> incorrect result i.e. Points outside t1 are grouped in canopies.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira