[
https://issues.apache.org/jira/browse/MAHOUT-825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122930#comment-13122930
]
Jeff Eastman commented on MAHOUT-825:
-------------------------------------
-2 on incorporating clusterStrictness in the Canopy generation phase. This
completely changes the semantics of canopy. Furthermore, the radius of a canopy
is not even calculated until cluanup(), after all the points have been
processed in the mapper.
-0.75 on using clusterStrictness in the Canopy classification phase to remove
outliers. If we are going to add an outlier filter of this flavor then it
should be added to all clustering classification codes, not just Canopy.
I still do not see the merit of incorporating this (experimental, untested and
still evolving) outlier filtering scheme into the classification phase at all.
Subsequent processing steps which read the clusteredPoints can easily also load
the clusters and perform whatever outlier removal is best for that particular
application. It does not need to be a separate step, and it is quite
application specific.
> Canopies grouping records outside t1
> ------------------------------------
>
> Key: MAHOUT-825
> URL: https://issues.apache.org/jira/browse/MAHOUT-825
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 0.6
> Environment: windows, linux
> Reporter: Paritosh Ranjan
> Labels: features, newbie, patch
> Fix For: 0.6
>
> Attachments: Clustering Remote Points - Two Big, Useless
> Clusters.txt, Not Clustering Remote Points - Two Meaningful Clusters.txt,
> canopy-clusterFilter-t1, canopy-outlier-elimination,
> canopy-outside-t1-points-patch-1, canopy-strict-clustering-flag
>
>
> While finding closest canopy, there is no check to ensure that it returns
> canopies which are within distance t1 from the point. This results in
> incorrect result i.e. Points outside t1 are grouped in canopies.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira