[
https://issues.apache.org/jira/browse/MAHOUT-825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120276#comment-13120276
]
Paritosh Ranjan commented on MAHOUT-825:
----------------------------------------
As you said,
"ClusterData simply assigns each point to the closest, maximum-likelihood
cluster given the computed centroids and the distance measure chosen."
which is not happening now for Canopy, and this patch fixes this problem.
I disagree with your statement that "T1, in particular, is unrelated to
clusterData". t1 "is" related to Canopy for sure. It can be t1, or t2, but the
canopies will be meaningful. Right now, any random point is getting clustered
in the canopy which is simply incorrect. This patch does makes sense by fixing
this problem.
Of course, multiple things can be done with the design and the clustering
approach. But, do you agree that this patch is fixing a currnt bug in the code.
If yes, then I think we should apply this patch, as the current Canopy
Clustering is simply not working, and gives a really hard time to the user, I
have experienced that. This patch fixes this problem.
> Canopies grouping records outside t1
> ------------------------------------
>
> Key: MAHOUT-825
> URL: https://issues.apache.org/jira/browse/MAHOUT-825
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 0.6
> Environment: windows, linux
> Reporter: Paritosh Ranjan
> Labels: features, newbie, patch
> Fix For: 0.6
>
> Attachments: canopy-outside-t1-points-patch-1
>
>
> While finding closest canopy, there is no check to ensure that it returns
> canopies which are within distance t1 from the point. This results in
> incorrect result i.e. Points outside t1 are grouped in canopies.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira