[ 
https://issues.apache.org/jira/browse/MAHOUT-825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120243#comment-13120243
 ] 

Jeff Eastman commented on MAHOUT-825:
-------------------------------------

Sorry, but I continue to disagree. As you point out, clusterData is not a 
canopy-specific activity: It could be factored out of Canopy and Kmeans, since 
they do it identically (FuzzyK and Dirichlet can also do maximum-likelihood 
classification, as an option). ClusterData simply assigns each point to the 
closest, maximum-likelihood cluster given the computed centroids and the 
distance measure chosen. Imposing additional semantics which causes some points 
to not be classified at all is just not correct, IMHO. T1, in particular, is 
unrelated to clusterData and indeed a given point may be within T1 of multiple 
canopy clusters. If you want to impose additional semantics, (e.g. to remove 
outliers) you need to do this in a separate processing step.
                
> Canopies grouping records outside t1
> ------------------------------------
>
>                 Key: MAHOUT-825
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-825
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.6
>         Environment: windows, linux
>            Reporter: Paritosh Ranjan
>              Labels: features, newbie, patch
>             Fix For: 0.6
>
>         Attachments: canopy-outside-t1-points-patch-1
>
>
> While finding closest canopy, there is no check to ensure that it returns 
> canopies which are within distance t1 from the point. This results in 
> incorrect result i.e. Points outside t1 are grouped in canopies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to