[ 
https://issues.apache.org/jira/browse/TEZ-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16045101#comment-16045101
 ] 

Siddharth Seth commented on TEZ-3605:
-------------------------------------

In PipelinedSorter - the final merge does not necessarily skip a fully empty 
partition. The check while creating DiskSegments can end up with a list which 
is empty, and invokes a merger on an empty list (not sure how this is handled)
Similarly in DefaultSorter, I think mergeParts needs some work.

Would be useful to have tests for both, i.e. when there's multiple spills 
involved, 1) where a single spill has a partition, another does not, 2) all 
spills don't have a partition

> Detect and prune empty partitions for the Ordered case
> ------------------------------------------------------
>
>                 Key: TEZ-3605
>                 URL: https://issues.apache.org/jira/browse/TEZ-3605
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Kuhu Shukla
>            Assignee: Kuhu Shukla
>         Attachments: TEZ-3605.001.patch, TEZ-3605.002.patch, 
> TEZ-3605.003.patch, TEZ-3605.004.patch, TEZ-3605.005.patch, 
> TEZ-3605.006.patch, TEZ-3605.007.patch, TEZ-3605.008.patch, 
> TEZ-3605.009.patch, TEZ-3605.010.patch
>
>
> Analogous to the Unordered case we should not have empty partition 
> entries/segments in the Ordered/DefaultSorter case. This will save writing 
> unnecessary data.
> Additionally, with tez_shuffle feature (TEZ-3334), in a heavily auto reduced 
> job, this change would allow not fetching empty partitions and then throwing 
> them away.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to