[ 
https://issues.apache.org/jira/browse/MAHOUT-82?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12639691#action_12639691
 ] 

Jeff Eastman commented on MAHOUT-82:
------------------------------------

This assertion needs a lot more justification before I would agree with it. The 
canopy reducer obtains cluster centroids from many mappers - each seeing only a 
portion of the input data - and attempts to coalesce them. Each mapper/combiner 
generates its own independent set of canopies and so there would be no common 
canopyIds to use in the reducer.

> Canopy map intermediate file structure should be keyed by canopyId.
> -------------------------------------------------------------------
>
>                 Key: MAHOUT-82
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-82
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.1
>            Reporter: Edward J. Yoon
>             Fix For: 0.1
>
>
>  When emit the point to the collector, it should be keyed by canopyId w/o 
> computed centroid. (or make a other key datum instead of hadoop.IO.Text)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to