[ 
https://issues.apache.org/jira/browse/HADOOP-2919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12577260#action_12577260
 ] 

Chris Douglas commented on HADOOP-2919:
---------------------------------------

I haven't been able to reproduce this failure in Linux or on MacOS. Looking at 
the console output, the timeout looks related to HADOOP-2971. I'm seeing a 
handful of the following errors from Hudson:

{noformat}
    [junit] 2008-03-10 23:22:51,803 INFO  dfs.DataNode 
(DataNode.java:run(1985)) - PacketResponder blk_1646669170773132170 1 Exception 
java.net.SocketTimeoutException: 60000 millis timeout while waiting for 
/127.0.0.1:34190 (local: /127.0.0.1:34496) to be ready for read
    [junit]   at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:188)
    [junit]   at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:135)
    [junit]   at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:121)
    [junit]   at java.io.DataInputStream.readFully(DataInputStream.java:176)
    [junit]   at java.io.DataInputStream.readLong(DataInputStream.java:380)
    [junit]   at 
org.apache.hadoop.dfs.DataNode$PacketResponder.run(DataNode.java:1957)
    [junit]   at java.lang.Thread.run(Thread.java:595)
{noformat}

Since the failure is coming from TestMiniMRDFSSort- code this patch certainly 
affects- this result is not auspicious, but I suspect the issue is not related 
to this patch.

> Create fewer copies of buffer data during sort/spill
> ----------------------------------------------------
>
>                 Key: HADOOP-2919
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2919
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Chris Douglas
>            Assignee: Chris Douglas
>             Fix For: 0.17.0
>
>         Attachments: 2919-0.patch, 2919-1.patch, 2919-2.patch, 2919-3.patch
>
>
> Currently, the sort/spill works as follows:
> Let r be the number of partitions
> For each call to collect(K,V) from map:
> * If buffers do not exist, allocate a new DataOutputBuffer to collect K,V 
> bytes, allocate r buffers for collecting K,V offsets
> * Write K,V into buffer, noting offsets
> * Register offsets with associated partition buffer, allocating/copying 
> accounting buffers if nesc
> * Calculate the total mem usage for buffer and all partition collectors by 
> iterating over the collectors
> * If total mem usage is greater than half of io.sort.mb, then start a new 
> thread to spill, blocking if another spill is in progress
> For each spill (assuming no combiner):
> * Save references to our K,V byte buffer and accounting data, setting the 
> former to null (will be recreated on the next call to collect(K,V))
> * Open a SequenceFile.Writer for this partition
> * Sort each partition separately (the current version of sort reuses, but 
> still requires wrapping, indices in IntWritable objects)
> * Build a RawKeyValueIterator of sorted data for the partition
> * Deserialize each key and value and call SequenceFile::append(K,V) on the 
> writer for this partition
> There are a number of opportunities for reducing the number of copies, 
> creations, and operations we perform in this stage, particularly since 
> growing many of the buffers involved requires that we copy the existing data 
> to the newly sized allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to