[
https://issues.apache.org/jira/browse/TEZ-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rajesh Balamohan updated TEZ-2001:
----------------------------------
Attachment: benchmark_q17_10TB.png
Attaching hive+tez query_17 results at 10 TB scale on 20 node cluster. Overall
there is around 20% improvement in overall job runtime. If the cluster size is
bigger (where reducer_2 tasks can fit in single wave as opposed to multiple
waves in this cluster), we could have got better runtime. Note that the first
event received in downstream is around 112 seconds (with pipelined shuffle), so
essentially all downstream tasks could have started downloading data if cluster
capacity was available.
||AppId||Runtime(seconds)||Sorter||Pipelined transfer||Disable Final
Merge||FirstEvent(seconds)||LastEvent(seconds)||Total Time (slowest reducer_2
task)||
|application_1424502260528_0036|553.76|PipelinedSorter|No|No|432|432|456|
|application_1424502260528_0037|572.33|PipelinedSorter|No|No|448|448|471|
|application_1424502260528_0038|477.71|PipelinedSorter|No|Yes|358|358|380|
|application_1424502260528_0039|465.134|PipelinedSorter|No|Yes|329|329|353|
|application_1424502260528_0040|478|PipelinedSorter|Yes|Yes|112|357|381|
|application_1424502260528_0041|486|PipelinedSorter|Yes|Yes|112|363|386|
> Support pipelined data transfer for ordered output
> --------------------------------------------------
>
> Key: TEZ-2001
> URL: https://issues.apache.org/jira/browse/TEZ-2001
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Rajesh Balamohan
> Assignee: Rajesh Balamohan
> Attachments: TEZ-2001.1.patch, TEZ-2001.2.patch, TEZ-2001.3.patch,
> benchmark_q17_10TB.png
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)