[
https://issues.apache.org/jira/browse/TEZ-2221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527425#comment-14527425
]
Rohini Palaniswamy edited comment on TEZ-2221 at 5/4/15 10:12 PM:
------------------------------------------------------------------
bq. what happens if someone does the following. This should also be disallowed.
Correct?
{code}
dag.createVertexGroup("group_1", v1,v2);
dag.createVertexGroup("group_2", v1,v2);
{code}
[~daijy] pointed out this breaks a lot of Pig scripts on Tez with
UnionOptimizer as we have multiple outputs from each vertex and we create a
vertex group for each of those output now. For eg: union followed by order by.
There will be one sample output and one partitioner output from the union
vertex going to two different downstream vertices. With the UnionOptimizer, the
union is removed and two vertex groups are created. If this is disallowed we
will have to reuse the same Vertex group to route multiple outputs.
GroupInputEdge.create(VertexGroup inputVertexGroup, Vertex outputVertex,
EdgeProperty edgeProperty, InputDescriptor mergedInput) API seem to allow that.
Will doing that work and that is how you want us to construct the plan?
Consider another case of union followed by replicate join with two tables
followed by order by. The plan will consist of 8 vertices - V1 (Load) + V2
(Load) + V3 (union) + V4 (Replicate join T1 load) + V5 (Replicate join T2 load)
+ V6 (partitioner) + V7 (sampler) + V8 (order by) with V1,V2->V3, V4->V3,
V5->V3, V3->V6, V3->V7, V7->V6, V6->V8. Optimized plan will become V4->(V1,V2
vertex group) , V5->(V1,V2 vertex group) , (V1,V2 vertex group) - > V6, (V1,V2
vertex group) - > V7, V7->V6, V6->V8. So using one vertex group for routing
multiple outputs and multiple inputs is how we are expected to construct the
plan?
was (Author: rohini):
bq. what happens if someone does the following. This should also be disallowed.
Correct?
{code}
dag.createVertexGroup("group_1", v1,v2);
dag.createVertexGroup("group_2", v1,v2);
{code}
[~daijy] pointed out this breaks a lot of Pig scripts on Tez with
UnionOptimizer as we have multiple outputs from each vertex and we create a
vertex group for each of those output now. For eg: union followed by order by.
There will be one sample output and one partitioner output from the union
vertex going to two different downstream vertices. With the UnionOptimizer, the
union is removed and two vertex groups are created. If this is disallowed we
will have to reuse the same Vertex group to route multiple outputs.
GroupInputEdge.create(VertexGroup inputVertexGroup, Vertex outputVertex,
EdgeProperty edgeProperty, InputDescriptor mergedInput) API seem to allow that.
Will doing that work and that is how you want us to construct the plan?
Consider another case of union followed by replicate join with two tables
followed by order by. The plan will consist of 8 vertices - V1 (Load) + V2
(Load) + V3 (union) + V4a (Replicate join T1 load) + V4b (Replicate join T2
load) + V5 (partitioner) + V6 (sampler) + V7 (order by) with V1,V2->V3,
V4a->V3, V4b->V3, V4->V5, V4->V6, V6->V5, V5->V7. Optimized plan will become
V4a -> (V1,V2 vertex group) , V4b -> (V1,V2 vertex group) , (V1,V2 vertex
group) -> V5, (V1,V2 vertex group) -> V6, V6->V5, V5->V7. So using one vertex
group for routing multiple outputs and multiple inputs is how we are expected
to construct the plan?
> VertexGroup name should be unqiue
> ---------------------------------
>
> Key: TEZ-2221
> URL: https://issues.apache.org/jira/browse/TEZ-2221
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Jeff Zhang
> Assignee: Jeff Zhang
> Fix For: 0.7.0, 0.5.4, 0.6.1
>
> Attachments: TEZ-2221-1.patch, TEZ-2221-2.patch, TEZ-2221-3.patch,
> TEZ-2221-4.patch
>
>
> VertexGroupCommitStartedEvent & VertexGroupCommitFinishedEvent use vertex
> group name to identify the vertex group commit, the same name of vertex group
> will conflict. While in the current equals & hashCode of VertexGroup, vertex
> group name and members name are used.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)