Zhiyuan Yang created TEZ-3708:
---------------------------------
Summary: Arbitrary parallelism for unpartitioned cartesian product
regardless of # src tasks
Key: TEZ-3708
URL: https://issues.apache.org/jira/browse/TEZ-3708
Project: Apache Tez
Issue Type: Sub-task
Reporter: Zhiyuan Yang
Assignee: Zhiyuan Yang
Current unpartitioned cartesian product has a few limitations
1. parallelism can be not enough in case of large split and small # src task
2. parallelism can be too much in in case of large # src task
3. workload is not ideally distributed across the worker. Even with auto
grouping, grouping by size may not be accurate because same size can means
different #record and different cartesian product ops.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)