[jira] Updated: (HADOOP-3514) Reduce seeks during shuffle, by inline crcs

Jothi Padmanabhan (JIRA) Mon, 08 Sep 2008 04:55:38 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-3514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jothi Padmanabhan updated HADOOP-3514:
--------------------------------------

    Status: Patch Available  (was: Open)

The above patch fixes the problem observed when running with the native lzo 
library for map output compression. 
The problem was with the IFileInputStream read method which is required to 
return the byte read as an integer. A simple assignment of
{code}
int result = byte
{code}
 does not work as the byte is interpreted as a signed byte and so the assigned 
integer has a wrong value. Instead the result has to be assigned as 
{code}
int result = (byte & 0xFF)
{code}
correctly assigns the byte to the integer

The following are the performance improvements observed with this patch when 
running the loadgen program with
60 reduce copiers, 100 http threads, 450 task trackers, with the following 
command line
<pre>
bin/hadoop jar hadoop-0.19.0-dev-test.jar loadgen \
-D test.randomtextwrite.bytes_per_map=$((240*1024)) \
-D test.randomtextwrite.total_bytes=$((200*1024*100000)) \
-D mapred.compress.map.output=false \
-r 2200 \
-outKey org.apache.hadoop.io.Text \
-outValue org.apache.hadoop.io.Text \
-outFormat org.apache.hadoop.mapred.lib.NullOutputFormat \
-outdir fakeout
</pre>

The patch showed an overall improvement of about 5% with about 10% improvement 
in shuffle.
                            Trunk              Patch
Map Time             6:10                 6:04
Reduce Time       17:11              16:22
Overall                23:21              22:26


> Reduce seeks during shuffle, by inline crcs
> -------------------------------------------
>
>                 Key: HADOOP-3514
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3514
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.0
>            Reporter: Devaraj Das
>            Assignee: Jothi Padmanabhan
>             Fix For: 0.19.0
>
>         Attachments: hadoop-3514-v1.patch, hadoop-3514-v10.patch, 
> hadoop-3514-v11.patch, hadoop-3514-v12.patch, hadoop-3514-v2.patch, 
> hadoop-3514-v3.patch, hadoop-3514-v4.patch, hadoop-3514-v5.patch, 
> hadoop-3514-v6.patch, hadoop-3514-v7.patch, hadoop-3514-v8.patch, 
> hadoop-3514-v9.patch, hadoop-3514.patch
>
>
> The number of seeks can be reduced by half in the iFile if we move the crc 
> into the iFile rather than having a separate file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-3514) Reduce seeks during shuffle, by inline crcs

Reply via email to