[ 
https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527425#comment-14527425
 ] 

Rohini Palaniswamy edited comment on TEZ-2221 at 5/4/15 10:12 PM:
------------------------------------------------------------------

bq. what happens if someone does the following. This should also be disallowed. 
Correct?
{code}
dag.createVertexGroup("group_1", v1,v2);
dag.createVertexGroup("group_2", v1,v2);
{code}

   [~daijy] pointed out this breaks a lot of Pig scripts on Tez with 
UnionOptimizer as we have multiple outputs from each vertex and we  create a 
vertex group for each of those output now.  For eg: union followed by order by. 
There will be one sample output and one partitioner output from the union 
vertex going to two different downstream vertices. With the UnionOptimizer, the 
union is removed and two vertex groups are created.  If this is disallowed we 
will have to reuse the same Vertex group to route multiple outputs. 
GroupInputEdge.create(VertexGroup inputVertexGroup, Vertex outputVertex, 
EdgeProperty edgeProperty, InputDescriptor mergedInput) API seem to allow that. 
 Will doing that work and that is how you want us to construct the plan?

Consider another case of union followed by replicate join with two tables 
followed by order by.  The plan will consist of 8 vertices - V1 (Load) + V2 
(Load) + V3 (union) + V4 (Replicate join T1 load) + V5 (Replicate join T2 load) 
+ V6 (partitioner) + V7 (sampler) + V8 (order by) with V1,V2->V3, V4->V3, 
V5->V3, V3->V6, V3->V7, V7->V6, V6->V8.  Optimized plan will become V4->(V1,V2 
vertex group) , V5->(V1,V2 vertex group) , (V1,V2 vertex group) - > V6, (V1,V2 
vertex group) - > V7, V7->V6, V6->V8. So using one vertex group for routing 
multiple outputs and multiple inputs is how we are expected to construct the 
plan? 




was (Author: rohini):
bq. what happens if someone does the following. This should also be disallowed. 
Correct?
{code}
dag.createVertexGroup("group_1", v1,v2);
dag.createVertexGroup("group_2", v1,v2);
{code}

   [~daijy] pointed out this breaks a lot of Pig scripts on Tez with 
UnionOptimizer as we have multiple outputs from each vertex and we  create a 
vertex group for each of those output now.  For eg: union followed by order by. 
There will be one sample output and one partitioner output from the union 
vertex going to two different downstream vertices. With the UnionOptimizer, the 
union is removed and two vertex groups are created.  If this is disallowed we 
will have to reuse the same Vertex group to route multiple outputs. 
GroupInputEdge.create(VertexGroup inputVertexGroup, Vertex outputVertex, 
EdgeProperty edgeProperty, InputDescriptor mergedInput) API seem to allow that. 
 Will doing that work and that is how you want us to construct the plan?

Consider another case of union followed by replicate join with two tables 
followed by order by.  The plan will consist of 8 vertices - V1 (Load) + V2 
(Load) + V3 (union) + V4a (Replicate join T1 load) + V4b (Replicate join T2 
load) + V5 (partitioner) + V6 (sampler) + V7 (order by) with V1,V2->V3, 
V4a->V3, V4b->V3, V4->V5, V4->V6, V6->V5, V5->V7.  Optimized plan will become 
V4a -> (V1,V2 vertex group) , V4b -> (V1,V2 vertex group) ,   (V1,V2 vertex 
group) -> V5, (V1,V2 vertex group) -> V6, V6->V5, V5->V7. So using one vertex 
group for routing multiple outputs and multiple inputs is how we are expected 
to construct the plan? 



> VertexGroup name should be unqiue
> ---------------------------------
>
>                 Key: TEZ-2221
>                 URL: https://issues.apache.org/jira/browse/TEZ-2221
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>             Fix For: 0.7.0, 0.5.4, 0.6.1
>
>         Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch, 
> TEZ-2221-4.patch
>
>
> VertexGroupCommitStartedEvent & VertexGroupCommitFinishedEvent use vertex 
> group name to identify the vertex group commit, the same name of vertex group 
> will conflict. While in the current equals & hashCode of VertexGroup, vertex 
> group name and members name are used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to