[ 
https://issues.apache.org/jira/browse/HADOOP-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HADOOP-2399:
----------------------------------

    Attachment: 2399-3.patch

This patch fixes the value iterator to reuse the key and value between 
iterations. Aggregation was assuming that the reduce inputs where not reused, 
so I stringified the value. Is that ok, Runping? I got a minor speed up of 2:33 
instead of 2:37 on a simple 1 node word count.

> Input key and value to combiner and reducer should be reused
> ------------------------------------------------------------
>
>                 Key: HADOOP-2399
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2399
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.15.1
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: 2399-3.patch, reuse-obj-2.patch, reuse-obj.patch
>
>
> Currently, the input key and value are recreated on every iteration for input 
> to the combiner and reducer. It would speed up the system substantially if we 
> reused the keys and values. The down side of doing it, is that it may break 
> applications that count on holding references to previous keys and values, 
> but I think it is worth doing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to