Also, the collector may not be called at all. That makes the types get buggered.
On Tue, Oct 14, 2008 at 7:04 PM, Jeff Eastman (JIRA) <[EMAIL PROTECTED]>wrote: > > [ > https://issues.apache.org/jira/browse/MAHOUT-82?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12639691#action_12639691] > > Jeff Eastman commented on MAHOUT-82: > ------------------------------------ > > This assertion needs a lot more justification before I would agree with it. > The canopy reducer obtains cluster centroids from many mappers - each seeing > only a portion of the input data - and attempts to coalesce them. Each > mapper/combiner generates its own independent set of canopies and so there > would be no common canopyIds to use in the reducer. > > > Canopy map intermediate file structure should be keyed by canopyId. > > ------------------------------------------------------------------- > > > > Key: MAHOUT-82 > > URL: https://issues.apache.org/jira/browse/MAHOUT-82 > > Project: Mahout > > Issue Type: Bug > > Components: Clustering > > Affects Versions: 0.1 > > Reporter: Edward J. Yoon > > Fix For: 0.1 > > > > > > When emit the point to the collector, it should be keyed by canopyId w/o > computed centroid. (or make a other key datum instead of hadoop.IO.Text) > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > > -- ted
