[
https://issues.apache.org/jira/browse/MAHOUT-825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13121158#comment-13121158
]
Sean Owen commented on MAHOUT-825:
----------------------------------
With clusterFilter set, distant points that would have formed a canopy of 1 no
longer form a canopy at all. I don't think this is the same as requesting that
those points not be clustered at all.
I don't see why these distant points affect any other canopies. They're already
farther than t1 away from other canopies, so don't affect the others. Right?
I do see the potential value in requesting to drop points that are too far from
clusters, rather than assign to clusters. I don't think that should be the only
or default behavior, since it's not the behavior now or in "vanilla"
implementations.
But it could be a flag. It does not have the same meaning as clusterFilter
though, so would need to be a different flag. I sympathize with not wanting so
many flags -- you can't have a flag for every little possible choice. This
would have to be. 10 flags isn't such a big deal. You don't have to set them
all. In any event, that's about the only type of change here that I could
imagine achieving consensus.
More thoughts from other voices?
> Canopies grouping records outside t1
> ------------------------------------
>
> Key: MAHOUT-825
> URL: https://issues.apache.org/jira/browse/MAHOUT-825
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 0.6
> Environment: windows, linux
> Reporter: Paritosh Ranjan
> Labels: features, newbie, patch
> Fix For: 0.6
>
> Attachments: canopy-clusterFilter-t1, canopy-outside-t1-points-patch-1
>
>
> While finding closest canopy, there is no check to ensure that it returns
> canopies which are within distance t1 from the point. This results in
> incorrect result i.e. Points outside t1 are grouped in canopies.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira