[
https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12580740#action_12580740
]
Hadoop QA commented on HADOOP-2919:
-----------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12378286/2919-6.patch
against trunk revision 619744.
@author +1. The patch does not contain any @author tags.
tests included +1. The patch appears to include 3 new or modified tests.
javadoc +1. The javadoc tool did not generate any warning messages.
javac +1. The applied patch does not generate any new javac compiler
warnings.
release audit +1. The applied patch does not generate any new release
audit warnings.
findbugs +1. The patch does not introduce any new Findbugs warnings.
core tests -1. The patch failed core unit tests.
contrib tests +1. The patch passed contrib unit tests.
Test results:
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2010/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2010/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results:
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2010/artifact/trunk/build/test/checkstyle-errors.html
Console output:
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2010/console
This message is automatically generated.
> Create fewer copies of buffer data during sort/spill
> ----------------------------------------------------
>
> Key: HADOOP-2919
> URL: https://issues.apache.org/jira/browse/HADOOP-2919
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: Chris Douglas
> Assignee: Chris Douglas
> Fix For: 0.17.0
>
> Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch, 2919-3.patch,
> 2919-4.patch, 2919-5.patch, 2919-6.patch
>
>
> Currently, the sort/spill works as follows:
> Let r be the number of partitions
> For each call to collect(K,V) from map:
> * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V
> bytes, allocate r buffers for collecting K,V offsets
> * Write K,V into buffer, noting offsets
> * Register offsets with associated partition buffer, allocating/copying
> accounting buffers if nesc
> * Calculate the total mem usage for buffer and all partition collectors by
> iterating over the collectors
> * If total mem usage is greater than half of io.sort.mb, then start a new
> thread to spill, blocking if another spill is in progress
> For each spill (assuming no combiner):
> * Save references to our K,V byte buffer and accounting data, setting the
> former to null (will be recreated on the next call to collect(K,V))
> * Open a SequenceFile.Writer for this partition
> * Sort each partition separately (the current version of sort reuses, but
> still requires wrapping, indices in IntWritable objects)
> * Build a RawKeyValueIterator of sorted data for the partition
> * Deserialize each key and value and call SequenceFile::append(K,V) on the
> writer for this partition
> There are a number of opportunities for reducing the number of copies,
> creations, and operations we perform in this stage, particularly since
> growing many of the buffers involved requires that we copy the existing data
> to the newly sized allocation.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.