[jira] Commented: (HADOOP-3366) Shuffle/Merge improvements

Devaraj Das (JIRA) Tue, 13 May 2008 05:57:19 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596385#action_12596385
 ]


Devaraj Das commented on HADOOP-3366:
-------------------------------------

I agree with 1 through 3.

bq. 4. Throw away RamFS, implement a simple manager who returns byte-arrays of 
a given size (i.e. decompressed shuffle split) until it runs out of the amount 
of memory available.

I am not sure this is justified. I'd propose

1) Make the InMemoryFileSystem independent of the CheckSumFileSystem
2) Implement special DataOutputBuffer/ValueBytes for the ramfs. The 
DataOutputBuffer gives us a nice abstraction to look at data, be it from files 
or memory. I think we should retain that abstraction and handle the ramfs as a 
special case.

We already use raw comparators. Not sure what you meant by this.

I'll submit a patch with some of the above thoughts implemented in a bit.


> Shuffle/Merge improvements
> --------------------------
>
>                 Key: HADOOP-3366
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3366
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.18.0
>
>
> This is intended to be a meta-issue to track various improvements to 
> shuffle/merge in the reducer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3366) Shuffle/Merge improvements

Reply via email to