[
https://issues.apache.org/jira/browse/MAHOUT-82?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12639691#action_12639691
]
Jeff Eastman commented on MAHOUT-82:
------------------------------------
This assertion needs a lot more justification before I would agree with it. The
canopy reducer obtains cluster centroids from many mappers - each seeing only a
portion of the input data - and attempts to coalesce them. Each mapper/combiner
generates its own independent set of canopies and so there would be no common
canopyIds to use in the reducer.
> Canopy map intermediate file structure should be keyed by canopyId.
> -------------------------------------------------------------------
>
> Key: MAHOUT-82
> URL: https://issues.apache.org/jira/browse/MAHOUT-82
> Project: Mahout
> Issue Type: Bug
> Components: Clustering
> Affects Versions: 0.1
> Reporter: Edward J. Yoon
> Fix For: 0.1
>
>
> When emit the point to the collector, it should be keyed by canopyId w/o
> computed centroid. (or make a other key datum instead of hadoop.IO.Text)
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.