[ 
https://issues.apache.org/jira/browse/MAHOUT-825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122568#comment-13122568
 ] 

Paritosh Ranjan commented on MAHOUT-825:
----------------------------------------

CanopyDriver's run method is already 12 parameters long. So, adding a new 
parameter will not be good. 

So, instead of that, I will change the boolean parameter "clusterStrictly" (the 
new flag) to an int as "clusterStrictness". 
clusterStrictness should be > 1 for any strictness, and the strictness will 
increase with the value to clusterStrictness.

Default value of "clusterStrictness" will be negative.

Allowing points to be excluded only from the computation of centroids will 
increase the centroid calculation quality for sure, but not excluding points 
from cluster will anyway create at least few useless clusters. 

But, I will surely try to exclude distant points from centroid calculation, 
either in this patch or later.
                
> Canopies grouping records outside t1
> ------------------------------------
>
>                 Key: MAHOUT-825
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-825
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.6
>         Environment: windows, linux
>            Reporter: Paritosh Ranjan
>              Labels: features, newbie, patch
>             Fix For: 0.6
>
>         Attachments: Clustering Remote Points - Two Big, Useless 
> Clusters.txt, Not Clustering Remote Points - Two Meaningful Clusters.txt, 
> canopy-clusterFilter-t1, canopy-outside-t1-points-patch-1, 
> canopy-strict-clustering-flag
>
>
> While finding closest canopy, there is no check to ensure that it returns 
> canopies which are within distance t1 from the point. This results in 
> incorrect result i.e. Points outside t1 are grouped in canopies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to