[ 
https://issues.apache.org/jira/browse/MAHOUT-825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13122930#comment-13122930
 ] 

Jeff Eastman commented on MAHOUT-825:
-------------------------------------

-2 on incorporating clusterStrictness in the Canopy generation phase. This 
completely changes the semantics of canopy. Furthermore, the radius of a canopy 
is not even calculated until cluanup(), after all the points have been 
processed in the mapper.

-0.75 on using clusterStrictness in the Canopy classification phase to remove 
outliers. If we are going to add an outlier filter of this flavor then it 
should be added to all clustering classification codes, not just Canopy.

I still do not see the merit of incorporating this (experimental, untested and 
still evolving) outlier filtering scheme into the classification phase at all. 
Subsequent processing steps which read the clusteredPoints can easily also load 
the clusters and perform whatever outlier removal is best for that particular 
application. It does not need to be a separate step, and it is quite 
application specific.
                
> Canopies grouping records outside t1
> ------------------------------------
>
>                 Key: MAHOUT-825
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-825
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.6
>         Environment: windows, linux
>            Reporter: Paritosh Ranjan
>              Labels: features, newbie, patch
>             Fix For: 0.6
>
>         Attachments: Clustering Remote Points - Two Big, Useless 
> Clusters.txt, Not Clustering Remote Points - Two Meaningful Clusters.txt, 
> canopy-clusterFilter-t1, canopy-outlier-elimination, 
> canopy-outside-t1-points-patch-1, canopy-strict-clustering-flag
>
>
> While finding closest canopy, there is no check to ensure that it returns 
> canopies which are within distance t1 from the point. This results in 
> incorrect result i.e. Points outside t1 are grouped in canopies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to