[
https://issues.apache.org/jira/browse/TEZ-3739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16026633#comment-16026633
]
Zhiyuan Yang commented on TEZ-3739:
-----------------------------------
Thanks [~sseth] for review!
bq. Tasks generating a large number of records, which would send a normal case
way past the target maxParallelism.
This case was tested in testDAGVertexOnlyGroupByMaxParallelism
bq. Equal number of records from each source, and validate an equal weight to
each of them
This case was tested in testDAGVertexOnlyGroupByMinOpsPerWorker (not exactly
equal, but similar)
bq. Parallelism = MaxParallelism (instead of getting close to maxParallelism)
This is similar case as too much record, where we cap parallelism with
maxParallelism.
> Fair CartesianProduct doesn't works well with huge difference in output size
> ----------------------------------------------------------------------------
>
> Key: TEZ-3739
> URL: https://issues.apache.org/jira/browse/TEZ-3739
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Zhiyuan Yang
> Assignee: Zhiyuan Yang
> Attachments: TEZ-3739.1.patch
>
>
> Specifically, the weighted factorization of initial parallelism goes crazy if
> #record of each side is too different. The formula works in real number, but
> not in integer.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)