[
https://issues.apache.org/jira/browse/HADOOP-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596385#action_12596385
]
Devaraj Das commented on HADOOP-3366:
-------------------------------------
I agree with 1 through 3.
bq. 4. Throw away RamFS, implement a simple manager who returns byte-arrays of
a given size (i.e. decompressed shuffle split) until it runs out of the amount
of memory available.
I am not sure this is justified. I'd propose
1) Make the InMemoryFileSystem independent of the CheckSumFileSystem
2) Implement special DataOutputBuffer/ValueBytes for the ramfs. The
DataOutputBuffer gives us a nice abstraction to look at data, be it from files
or memory. I think we should retain that abstraction and handle the ramfs as a
special case.
We already use raw comparators. Not sure what you meant by this.
I'll submit a patch with some of the above thoughts implemented in a bit.
> Shuffle/Merge improvements
> --------------------------
>
> Key: HADOOP-3366
> URL: https://issues.apache.org/jira/browse/HADOOP-3366
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: Arun C Murthy
> Assignee: Arun C Murthy
> Fix For: 0.18.0
>
>
> This is intended to be a meta-issue to track various improvements to
> shuffle/merge in the reducer.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.