[jira] [Updated] (TEZ-2001) Support pipelined data transfer for ordered output

Rajesh Balamohan (JIRA) Mon, 23 Feb 2015 01:09:40 -0800

     [ 
https://issues.apache.org/jira/browse/TEZ-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Rajesh Balamohan updated TEZ-2001:
----------------------------------
    Attachment: benchmark_q17_10TB.png

Attaching hive+tez query_17 results at 10 TB scale on 20 node cluster. Overall 
there is around 20% improvement in overall job runtime.  If the cluster size is 
bigger (where reducer_2 tasks can fit in single wave as opposed to multiple 
waves in this cluster), we could have got better runtime.  Note that the first 
event received in downstream is around 112 seconds (with pipelined shuffle), so 
essentially all downstream tasks could have started downloading data if cluster 
capacity was available.


||AppId||Runtime(seconds)||Sorter||Pipelined transfer||Disable Final 
Merge||FirstEvent(seconds)||LastEvent(seconds)||Total Time (slowest reducer_2 
task)||
|application_1424502260528_0036|553.76|PipelinedSorter|No|No|432|432|456|
|application_1424502260528_0037|572.33|PipelinedSorter|No|No|448|448|471|
|application_1424502260528_0038|477.71|PipelinedSorter|No|Yes|358|358|380|
|application_1424502260528_0039|465.134|PipelinedSorter|No|Yes|329|329|353|
|application_1424502260528_0040|478|PipelinedSorter|Yes|Yes|112|357|381|
|application_1424502260528_0041|486|PipelinedSorter|Yes|Yes|112|363|386|



> Support pipelined data transfer for ordered output
> --------------------------------------------------
>
>                 Key: TEZ-2001
>                 URL: https://issues.apache.org/jira/browse/TEZ-2001
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>         Attachments: TEZ-2001.1.patch, TEZ-2001.2.patch, TEZ-2001.3.patch, 
> benchmark_q17_10TB.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (TEZ-2001) Support pipelined data transfer for ordered output

Reply via email to