[
https://issues.apache.org/jira/browse/TEZ-4208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17166857#comment-17166857
]
Rajesh Balamohan commented on TEZ-4208:
---------------------------------------
Q67 runtime with/without patch in internal cluster @ 10 TB scale:
|| ||Without Patch||With Patch||
|Job Runtime (in seconds)|1961.63 s|1656.14 s|
|TaskCounter_Map_1_OUTPUT_Reducer_2:|
| |
|OUTPUT_BYTES_PHYSICAL: |457771151796|311823523913|
|OUTPUT_RECORDS:|20169930972|20169930972|
|SHUFFLE_CHUNK_COUNT:|37776|5193|
> Pipelinesorter uses single SortSpan after spill
> -----------------------------------------------
>
> Key: TEZ-4208
> URL: https://issues.apache.org/jira/browse/TEZ-4208
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Rajesh Balamohan
> Priority: Major
> Attachments: TEZ-4208.1.patch, q67_sorter.log
>
>
> Though it could have created multiple spans, tez always uses the first span
> after spill. It is quite possible that other spans are bigger compared to the
> first one, due to progressive space allocation. Fixing this would help in
> reducing the number of spills (depending on the jobs) and lesser load for
> indexcache entries (as lesser number of files have to be opened).
--
This message was sent by Atlassian Jira
(v8.3.4#803005)