[ https://issues.apache.org/jira/browse/HADOOP-5494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Devaraj Das updated HADOOP-5494: -------------------------------- Attachment: 5494-3.patch Attaching an updated patch after some large scale tests. The earlier patch had a bug to do with reusing the "value" DataInputBuffer across in-memory and on-disk segments. Thanks to Chris for helping me trace this bug. No negative performance impact was observed in my runs. > IFile.Reader should have a nextRawKey/nextRawValue > -------------------------------------------------- > > Key: HADOOP-5494 > URL: https://issues.apache.org/jira/browse/HADOOP-5494 > Project: Hadoop Core > Issue Type: Improvement > Components: mapred > Affects Versions: 0.18.0 > Reporter: Devaraj Das > Assignee: Devaraj Das > Fix For: 0.21.0 > > Attachments: 5494-1.patch, 5494-2.patch, 5494-3.patch > > > Merger.Segment has only the next() method defined which internally calls > next(key,value) on the underlying IFile stream. This would read both the key > and the value bytes. It would be good to have Merger.Segment.nextRawKey(), > that would read only the key and delay reading the value until needed (in > Merger.MergeQueue.next()) via a new method Merger.Segment.nextRawValue(). > This would mean that we load only one value bytes at a time, and hence would > incur potentially much less (depending on how big the values are) on the > memory footprint. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.