[jira] Commented: (MAHOUT-82) Canopy map intermediate file structure should be keyed by canopyId.

Jeff Eastman (JIRA) Fri, 17 Oct 2008 09:15:07 -0700

    [ 
https://issues.apache.org/jira/browse/MAHOUT-82?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640559#action_12640559
 ]


Jeff Eastman commented on MAHOUT-82:
------------------------------------

I applied the patch and the unit tests continue to pass. The change affects the 
communication between the mappers and their combiners and not between the 
combiners and reducer, so my earlier comment referred to a different interface. 
In this instance, the mapper records *can* be keyed by the canopyId alone, 
since they do share a common id-space. The current implementation passes the 
entire canopy - including its original center - as key but this information is 
not used when summing points for the centroid for output to the reducer. The 
key is only used to correlate the various points during copy/sort prior to the 
summing process and is not actually used within it.

I think Edward's patch represents a small performance improvement, in that 
shorter keys would presumably be faster than longer ones.

+1
Jeff

> Canopy map intermediate file structure should be keyed by canopyId.
> -------------------------------------------------------------------
>
>                 Key: MAHOUT-82
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-82
>             Project: Mahout
>          Issue Type: Bug
>          Components: Clustering
>    Affects Versions: 0.1
>            Reporter: Edward J. Yoon
>             Fix For: 0.1
>
>         Attachments: MAHOUT-82.patch
>
>
>  When emit the point to the collector, it should be keyed by canopyId w/o 
> computed centroid. (or make a other key datum instead of hadoop.IO.Text)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAHOUT-82) Canopy map intermediate file structure should be keyed by canopyId.

Reply via email to