[ 
https://issues.apache.org/jira/browse/TEZ-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15665901#comment-15665901
 ] 

Ming Ma commented on TEZ-3458:
------------------------------

Thanks [~aplusplus].

* Grouper abstraction is nice. Maybe 
FairShuffleVertexManager#PartitionsGroupingCalculator should use that as well. 
If you agree, it can be  moved out of cartesian product.
* The grouping is based on static per-source-vertex desiredBytesPerGroup, not 
the aggregated size across all source vertices.  So applications will config 
the value based on the desired aggregate input size and the number of source 
vertices? Wonder if there is any scenario to assign different values of 
desiredBytesPerGroup for different source vertex dynamically.
* vertexOutputBytes should be long
* What if some source vertex doesn’t generate output?
{noformat}
      if (vertexSentVME.size() != sourceVertices.size()) {
        return false;
      }
{noformat}
* There are similar "ceil" calculations in various places like 
ShuffleUtils#ceil. Maybe we can define some common function for it to take care 
of casting of long to int and  the overflow handling in ShuffleVertexManager.
{noformat}
        int desiredNumGroup =
          (int) ((vertexOutputBytes[i] + desiredBytesPerGroup - 1) / 
desiredBytesPerGroup);
{noformat}
* What if the VME comes from some broadcast vertex?
{noformat}
      int position = sourceVertices.indexOf(srcVertex);
{noformat}

> Auto grouping for cartesian product edge(unpartitioned case)
> ------------------------------------------------------------
>
>                 Key: TEZ-3458
>                 URL: https://issues.apache.org/jira/browse/TEZ-3458
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Zhiyuan Yang
>            Assignee: Zhiyuan Yang
>         Attachments: TEZ-3458.1.patch, TEZ-3458.2.patch, TEZ-3458.3.patch, 
> TEZ-3458.4.patch
>
>
> Original CartesianProductVertexManagerUnpartitioned set parallelism as 
> product of all source vertices parallelism which may explode to insane 
> number. We should do auto reduce as in ShuffleVertexManager to avoid this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to