[
https://issues.apache.org/jira/browse/TEZ-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15665901#comment-15665901
]
Ming Ma commented on TEZ-3458:
------------------------------
Thanks [~aplusplus].
* Grouper abstraction is nice. Maybe
FairShuffleVertexManager#PartitionsGroupingCalculator should use that as well.
If you agree, it can be moved out of cartesian product.
* The grouping is based on static per-source-vertex desiredBytesPerGroup, not
the aggregated size across all source vertices. So applications will config
the value based on the desired aggregate input size and the number of source
vertices? Wonder if there is any scenario to assign different values of
desiredBytesPerGroup for different source vertex dynamically.
* vertexOutputBytes should be long
* What if some source vertex doesn’t generate output?
{noformat}
if (vertexSentVME.size() != sourceVertices.size()) {
return false;
}
{noformat}
* There are similar "ceil" calculations in various places like
ShuffleUtils#ceil. Maybe we can define some common function for it to take care
of casting of long to int and the overflow handling in ShuffleVertexManager.
{noformat}
int desiredNumGroup =
(int) ((vertexOutputBytes[i] + desiredBytesPerGroup - 1) /
desiredBytesPerGroup);
{noformat}
* What if the VME comes from some broadcast vertex?
{noformat}
int position = sourceVertices.indexOf(srcVertex);
{noformat}
> Auto grouping for cartesian product edge(unpartitioned case)
> ------------------------------------------------------------
>
> Key: TEZ-3458
> URL: https://issues.apache.org/jira/browse/TEZ-3458
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Zhiyuan Yang
> Assignee: Zhiyuan Yang
> Attachments: TEZ-3458.1.patch, TEZ-3458.2.patch, TEZ-3458.3.patch,
> TEZ-3458.4.patch
>
>
> Original CartesianProductVertexManagerUnpartitioned set parallelism as
> product of all source vertices parallelism which may explode to insane
> number. We should do auto reduce as in ShuffleVertexManager to avoid this.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)