Zhiyuan Yang created TEZ-3708:
---------------------------------

             Summary: Arbitrary parallelism for unpartitioned cartesian product 
regardless of # src tasks
                 Key: TEZ-3708
                 URL: https://issues.apache.org/jira/browse/TEZ-3708
             Project: Apache Tez
          Issue Type: Sub-task
            Reporter: Zhiyuan Yang
            Assignee: Zhiyuan Yang


Current unpartitioned cartesian product has a few limitations
1. parallelism can be not enough in case of large split and small # src task
2. parallelism can be too much in in case of large # src task
3. workload is not ideally distributed across the worker. Even with auto 
grouping, grouping by size may not be accurate because same size can means 
different #record and different cartesian product ops.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to