[ 
https://issues.apache.org/jira/browse/TEZ-3552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated TEZ-3552:
-------------------------
    Attachment: TEZ-3552-2.patch

oh, I didn't upload the patch. Thanks [~aplusplus]. To get the unit test pass 
and verify the functionality, here is the test code. The assumption of the 
validation is that given the decent number of input splits Collections.shuffle 
is almost certain to shuffle at least some elements.

> Shuffle split array when size-based sorting is turned off
> ---------------------------------------------------------
>
>                 Key: TEZ-3552
>                 URL: https://issues.apache.org/jira/browse/TEZ-3552
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Ming Ma
>            Assignee: Zhiyuan Yang
>         Attachments: TEZ-3552-2.patch, TEZ-3552.1.patch
>
>
> TEZ-3430 adds the functionality to skip size-based split sorting to help with 
> job runtime. During further testing, the original split array for certain 
> inputs before sorting aren't randomly distributed in size. So when the spit 
> sorting is turned off, we should shuffle the split instead of doing nothing. 
> That will make the size distribution more even.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to