[jira] [Comment Edited] (TEZ-1803) Support > 2gb sort buffer in pipelinedsorter

Rajesh Balamohan (JIRA) Wed, 21 Jan 2015 06:00:00 -0800

    [ 
https://issues.apache.org/jira/browse/TEZ-1803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14285638#comment-14285638
 ]


Rajesh Balamohan edited comment on TEZ-1803 at 1/21/15 1:58 PM:
----------------------------------------------------------------

- Ran terasort (500 GB) on test cluster. 
- Changed tez.runtime.io.sort.mb with 4 GB container (i.e io.sort.mb with 1200 
and 2500 respectively).  Couldn't test with higher container size due to some 
other issue, raised a JIRA separately for that.
- With (tez.runtime.io.sort.mb = 2134 and pipelinedsorter with 2 threads), Map 
Phase time: 257 secs
{code}
INFO [Initializer 1] impl.ExternalSorter: Requested SortBufferSize 
(tez.runtime.io.sort.mb): 2500
INFO [TezChild] resources.MemoryDistributor: Informing: OUTPUT, finalreduce, 
org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput: 
requested=2621440000, allocated=2238311301
INFO [TezChild] impl.PipelinedSorter: Number of Blocks : 2, 
maxMemUsage=2237661184, BLOCK_SIZE=1610612736
INFO [TezChild] impl.PipelinedSorter: tez.runtime.io.sort.mb = 2134
{code}

- With (tez.runtime.io.sort.mb = 1200 and pipelinedsorter with 2 threads), Map 
Phase time: 294 secs
{code}
INFO [Initializer 1] impl.ExternalSorter: Requested SortBufferSize 
(tez.runtime.io.sort.mb): 1200
INFO [TezChild] resources.MemoryDistributor: Informing: OUTPUT, finalreduce, 
org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput: 
requested=1258291200, allocated=1258291200
INFO [TezChild] impl.PipelinedSorter: Number of Blocks : 1, 
maxMemUsage=1258291200, BLOCK_SIZE=1610612736
INFO [TezChild] impl.PipelinedSorter: tez.runtime.io.sort.mb = 1200
{code}
- Effectively there is a 12-13% runtime improvement.
- Apart from this, there is good amount of savings in disk spills.  Will attach 
the tez-ui counters page separately here.
- Ran teravalidate benchmark to validate the results.


was (Author: rajesh.balamohan):
- Ran terasort (500 GB) on test cluster. 
- Changed tez.runtime.io.sort.mb with 4 GB container (i.e io.sort.mb with 1200 
and 2500 respectively).  Couldn't test with higher container size due to some 
other issue, raised a JIRA separately for that.
- With (tez.runtime.io.sort.mb = 2134 and pipelinedsorter with 2 threads), Map 
Phase time: 257 secs
- With (tez.runtime.io.sort.mb = 1200 and pipelinedsorter with 2 threads), Map 
Phase time: 294 secs
- Effectively there is a 12-13% runtime improvement.
- Apart from this, there is good amount of savings in disk spills.  Will attach 
the tez-ui counters page separately here.
- Ran teravalidate benchmark to validate the results.

> Support > 2gb sort buffer in pipelinedsorter
> --------------------------------------------
>
>                 Key: TEZ-1803
>                 URL: https://issues.apache.org/jira/browse/TEZ-1803
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>              Labels: performance
>         Attachments: TEZ-1803.1.patch, TEZ-1803.2.patch, TEZ-1803.3.patch, 
> TEZ-1803.4.patch, TEZ-1803.5.patch, TEZ-1803.6.patch, TEZ-1803.WIP.1.patch, 
> map_task_hive_query_92.png, mapphase_counters_with_1200mb.png, 
> mapphase_counters_with_2500mb.png, mapphase_time_with_1200mb.png, 
> mapphase_time_with_2500mb.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (TEZ-1803) Support > 2gb sort buffer in pipelinedsorter

Reply via email to