[jira] Updated: (HADOOP-3366) Shuffle/Merge improvements

Devaraj Das (JIRA) Fri, 16 May 2008 06:10:22 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Devaraj Das updated HADOOP-3366:
--------------------------------

    Attachment: 3366.1.patch

(An offline discussion led me to agree to the suggestion that we should not 
have the file abstraction for the in memory merge. The file streams adds 
overhead which is not desirable in a performance critical section.)
This half-done patch is up for a high-level review. It introduces a 
ByteArrayManager that shuffle can use to store files as raw byte-arrays instead 
of files in the ramfs. It also defines a merge routine that can merge a bunch 
of such byte-arrays. There is some dependency of the remaining work, i.e., 
changing the shuffle code to use the ByteArrayManager instead of the ramfs, on 
the patch for HADOOP-2095 (since that patch changes the layout of the 
intermediate sequence file). I'll see what else can be done without that patch 
being available.

By the way, I have done the patch assuming the layout as 
<key-len><val-len><key><value>   (the difference w.r.t the earlier proposed 
layout is that the lengths are together). That made the parsing of the byte 
arrays simpler. 

> Shuffle/Merge improvements
> --------------------------
>
>                 Key: HADOOP-3366
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3366
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.18.0
>
>         Attachments: 3366.1.patch
>
>
> This is intended to be a meta-issue to track various improvements to 
> shuffle/merge in the reducer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3366) Shuffle/Merge improvements

Reply via email to