[
https://issues.apache.org/jira/browse/MAHOUT-825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13120294#comment-13120294
]
Jeff Eastman commented on MAHOUT-825:
-------------------------------------
Paritosh,
I remain unconvinced that you have demonstrated your assertion that "current
Canopy Clustering is simply not working".
Sure, T1 is related to canopy generation, but it is unrelated to
maximum-likelihood classification which is common to most of the other
clustering algorithms. This step simply assigns each point to its closest
cluster. If you wish to impose additional semantics on the clustering outputs
(e.g. outlier elimination), you need to do this in a separate processing step.
I'm not going to commit this patch for the reasons I've stated above. If your
arguments convince one of the other committers to weigh in and argue for a
different outcome then we can continue this discussion. For now, I'm marking
this wont fix.
> Canopies grouping records outside t1
> ------------------------------------
>
> Key: MAHOUT-825
> URL: https://issues.apache.org/jira/browse/MAHOUT-825
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 0.6
> Environment: windows, linux
> Reporter: Paritosh Ranjan
> Labels: features, newbie, patch
> Fix For: 0.6
>
> Attachments: canopy-outside-t1-points-patch-1
>
>
> While finding closest canopy, there is no check to ensure that it returns
> canopies which are within distance t1 from the point. This results in
> incorrect result i.e. Points outside t1 are grouped in canopies.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira