[
https://issues.apache.org/jira/browse/MAHOUT-825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170702#comment-13170702
]
Jeff Eastman commented on MAHOUT-825:
-------------------------------------
Hmn, recent major reformatting has invalidated the patch. Not clear where to go
on this.
a) Back up to a minimalist fix to stop distance > T1 clusterings like the issue
says. Shouldn't happen in sequential mode; the outliers are a result of the
subsequent, reducer, cluster mergers. (quit grinning, Paritosh)
b) Write a new post processor to generalize the outlier filter capability.
Target 0.7
> Canopies grouping records outside t1
> ------------------------------------
>
> Key: MAHOUT-825
> URL: https://issues.apache.org/jira/browse/MAHOUT-825
> Project: Mahout
> Issue Type: Improvement
> Components: Clustering
> Affects Versions: 0.6
> Environment: windows, linux
> Reporter: Paritosh Ranjan
> Assignee: Jeff Eastman
> Labels: features, newbie, patch
> Fix For: 0.6
>
> Attachments: Clustering Remote Points - Two Big, Useless
> Clusters.txt, MAHOUT-825.patch, Not Clustering Remote Points - Two Meaningful
> Clusters.txt, canopy-clusterFilter-t1, canopy-outlier-elimination,
> canopy-outside-t1-points-patch-1, canopy-radius-based-outlier-elimination,
> canopy-strict-clustering-flag
>
>
> While finding closest canopy, there is no check to ensure that it returns
> canopies which are within distance t1 from the point. This results in
> incorrect result i.e. Points outside t1 are grouped in canopies.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira