[
https://issues.apache.org/jira/browse/MAHOUT-825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Paritosh Ranjan updated MAHOUT-825:
-----------------------------------
Comment: was deleted
(was: The clusterData phase is also run only when runClustering variable is
true ( as shown in the code snippet below ). So, the canopies are already
generated and computed before clusterData phase.
if (runClustering) {
clusterData(conf, input, clustersOut, output, measure, t1, t2,
runSequential);
}
clusterData is just grouping/pointing/classifying/identifying that which point
belongs to which vector. Adding a remote point (greater than t1) in the already
generated/computed canopies is just wasting this feature. There is no use of
doing it.
)
> Canopies grouping records outside t1
> ------------------------------------
>
> Key: MAHOUT-825
> URL: https://issues.apache.org/jira/browse/MAHOUT-825
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 0.6
> Environment: windows, linux
> Reporter: Paritosh Ranjan
> Labels: features, newbie, patch
> Fix For: 0.6
>
> Attachments: canopy-outside-t1-points-patch-1
>
>
> While finding closest canopy, there is no check to ensure that it returns
> canopies which are within distance t1 from the point. This results in
> incorrect result i.e. Points outside t1 are grouped in canopies.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira