[jira] [Commented] (MAHOUT-825) Canopies grouping records outside t1

Jeff Eastman (Commented) (JIRA) Mon, 03 Oct 2011 10:01:59 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13119400#comment-13119400
 ]


Jeff Eastman commented on MAHOUT-825:
-------------------------------------

Canopy is intended to be a fast, approximate clustering algorithm. The Mahout 
sequential implementation runs a single pass over the data to produce 
approximate cluster centers. The mapreduce implementation runs one pass in each 
mapper and another pass in the reducer, to combine the results from the various 
mappers. The clusters produced by the sequential and mapreduce implementations 
will be different as a result.

Once cluster centers are determined, the classification (clustering) of points 
follows a maximum-likelihood method which assigns each point to the closest 
cluster. This proposed patch modifies that method to impose an additional 
(d<T1) criteria on cluster assignment. This can result in some of the input 
points not being classified at all. I don't view this as a step in the right 
direction, nor do I think this is an incorrect result.

-1 I'm inclined to reject this patch for these reasons.
                
> Canopies grouping records outside t1
> ------------------------------------
>
>                 Key: MAHOUT-825
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-825
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.6
>         Environment: windows, linux
>            Reporter: Paritosh Ranjan
>              Labels: features, newbie, patch
>             Fix For: 0.6
>
>         Attachments: canopy-outside-t1-points-patch-1
>
>
> While finding closest canopy, there is no check to ensure that it returns 
> canopies which are within distance t1 from the point. This results in 
> incorrect result i.e. Points outside t1 are grouped in canopies.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAHOUT-825) Canopies grouping records outside t1

Reply via email to