[ 
https://issues.apache.org/jira/browse/TEZ-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15668758#comment-15668758
 ] 

Zhiyuan Yang commented on TEZ-3458:
-----------------------------------

Thanks for review! 
{quote}
Grouper abstraction is nice. Maybe 
FairShuffleVertexManager#PartitionsGroupingCalculator should use that as well. 
If you agree, it can be moved out of cartesian product.
{quote}
I've moved it to 
/tez-runtime-library/src/main/java/org/apache/tez/runtime/library/utils. 

{quote}
The grouping is based on static per-source-vertex desiredBytesPerGroup, not the 
aggregated size across all source vertices. So applications will config the 
value based on the desired aggregate input size and the number of source 
vertices?
{quote}
Aggregated size across all source vertices isn't closer to the true amount of 
work comparing to per-vertex grouping. Ideally we want each task to have same 
number of combinations of input entries, but current stats only support data 
size instead of number of entries, so the best we can do it to assume each 
entry has same size and output size is a good estimation of number of entries.

{quote}
Wonder if there is any scenario to assign different values of 
desiredBytesPerGroup for different source vertex dynamically.
{quote}
Per-vertex config should be more accurate than single static config. But one 
config is easy to use for user and good enough for first step. Actually if we 
are going to support per-vertex config, I'd rather implement stats of number of 
entries and still keep single config.

{quote}
vertexOutputBytes should be long
{quote}
Fixed.

{quote}
What if some source vertex doesn’t generate output?
{quote}
Source vertex should always generate output because it has edge connected to cp 
vertex. The problem is output is not required to generate VertexManagerEvent on 
close, although all existings do this. With this in mind, we can say if vertex 
manager depends on this behavior, it shouldn't be used with outputs that don't 
generate VertexManagerEvent.

{quote}
What if the VME comes from some broadcast vertex?
{quote}
Nice catch. I forgot this case while doing rebase. Thanks!

> Auto grouping for cartesian product edge(unpartitioned case)
> ------------------------------------------------------------
>
>                 Key: TEZ-3458
>                 URL: https://issues.apache.org/jira/browse/TEZ-3458
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Zhiyuan Yang
>            Assignee: Zhiyuan Yang
>         Attachments: TEZ-3458.1.patch, TEZ-3458.2.patch, TEZ-3458.3.patch, 
> TEZ-3458.4.patch, TEZ-3458.5.patch
>
>
> Original CartesianProductVertexManagerUnpartitioned set parallelism as 
> product of all source vertices parallelism which may explode to insane 
> number. We should do auto reduce as in ShuffleVertexManager to avoid this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to