[ 
https://issues.apache.org/jira/browse/MAHOUT-825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171152#comment-13171152
 ] 

Jeff Eastman commented on MAHOUT-825:
-------------------------------------

Scary we've reversed positions :). But I tend to agree that adding outlier 
removal in a pluggable manner is the best long term solution. Actually, I think 
it is possible to factor all of the various clustering steps (classification of 
points) into an independent job which accepts pointsIn, clustersIn and some 
other args to do the classification for all the clustering algorithms (they are 
really quite redundant in their current incarnations). It might even use the 
ClusterClassifier.classify() method which would help with 
classification/clustering convergence.

But this should be for 0.7, not 0.6.
                
> Canopies grouping records outside t1
> ------------------------------------
>
>                 Key: MAHOUT-825
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-825
>             Project: Mahout
>          Issue Type: Improvement
>          Components: Clustering
>    Affects Versions: 0.6
>         Environment: windows, linux
>            Reporter: Paritosh Ranjan
>            Assignee: Jeff Eastman
>              Labels: features, newbie, patch
>             Fix For: 0.6
>
>         Attachments: Clustering Remote Points - Two Big, Useless 
> Clusters.txt, MAHOUT-825.patch, Not Clustering Remote Points - Two Meaningful 
> Clusters.txt, canopy-clusterFilter-t1, canopy-outlier-elimination, 
> canopy-outside-t1-points-patch-1, canopy-radius-based-outlier-elimination, 
> canopy-strict-clustering-flag
>
>
> While finding closest canopy, there is no check to ensure that it returns 
> canopies which are within distance t1 from the point. This results in 
> incorrect result i.e. Points outside t1 are grouped in canopies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to