[ 
https://issues.apache.org/jira/browse/MAHOUT-825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122550#comment-13122550
 ] 

Ted Dunning commented on MAHOUT-825:
------------------------------------

{quote}
I experimented with the distance calculation, and now, I am using radius 
instead of the t1 parameter.

{code}
private boolean shouldCluster(Canopy canopy, Vector point) {
  if (clusterStrictly) {
    Vector currentCenter = canopy.getCenter(); 
    double distance = measure.distance(currentCenter.getLengthSquared(), 
currentCenter, point); 
    double radius = canopy.getRadius().getLengthSquared(); 
    return distance < radius*3;
  }
  return true;
}
{code}

The positives and negatives of this approach are :

+ve : Its not dependent on t1. Radius is, I think a better way to calculate 
distances from canopies ( acoording to the discussion above ). I experienced, 
that the results are also "much better" than using t1. Some meaningful points, 
those were missed by using t1, are being clustered using this approach.

-ve : Now, I have no control on the quality of the cluster. The number, 3, is a 
constant. With t1, at least I was able to control the quality of the cluster.
{quote}
                
> Canopies grouping records outside t1
> ------------------------------------
>
>                 Key: MAHOUT-825
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-825
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.6
>         Environment: windows, linux
>            Reporter: Paritosh Ranjan
>              Labels: features, newbie, patch
>             Fix For: 0.6
>
>         Attachments: Clustering Remote Points - Two Big, Useless 
> Clusters.txt, Not Clustering Remote Points - Two Meaningful Clusters.txt, 
> canopy-clusterFilter-t1, canopy-outside-t1-points-patch-1, 
> canopy-strict-clustering-flag
>
>
> While finding closest canopy, there is no check to ensure that it returns 
> canopies which are within distance t1 from the point. This results in 
> incorrect result i.e. Points outside t1 are grouped in canopies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to