[
https://issues.apache.org/jira/browse/PIG-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507768#comment-15507768
]
Travis Woodruff commented on PIG-5033:
--------------------------------------
Here's the DAG plan. I've only started testing Tez today, so I'm not very
familiar with how these should look, but the fact that both the joins use
scope-61 seems a bit suspicious.
{code}
Tez vertex scope-55 -> Tez vertex scope-57,
Tez vertex scope-56 -> Tez vertex scope-57,
Tez vertex scope-61 -> Tez vertex scope-57,
Tez vertex scope-57
Tez vertex scope-55
# Plan on vertex
POValueOutputTez - scope-59 -> [scope-57]
|
|---a: New For Each(false,false)[bag] - scope-7
| |
| Cast[int] - scope-2
| |
| |---Project[bytearray][0] - scope-1
| |
| Cast[int] - scope-5
| |
| |---Project[bytearray][1] - scope-4
|
|---a: Load(file:///tmp/input1:org.apache.pig.builtin.PigStorage) - scope-0
Tez vertex scope-56
# Plan on vertex
POValueOutputTez - scope-60 -> [scope-57]
|
|---b: New For Each(false,false)[bag] - scope-15
| |
| Cast[int] - scope-10
| |
| |---Project[bytearray][0] - scope-9
| |
| Cast[int] - scope-13
| |
| |---Project[bytearray][1] - scope-12
|
|---b: Load(file:///tmp/input2:org.apache.pig.builtin.PigStorage) - scope-8
Tez vertex scope-61
# Plan on vertex
c: Split - scope-67
| |
| Local Rearrange[tuple]{int}(false) - scope-37 -> scope-57
| | |
| | Project[int][0] - scope-33
| |
| |---e: Filter[bag] - scope-28
| | |
| | Greater Than[boolean] - scope-31
| | |
| | |---Project[int][1] - scope-29
| | |
| | |---Constant(3) - scope-30
| |
| Local Rearrange[tuple]{int}(false) - scope-51 -> scope-57
| | |
| | Project[int][0] - scope-47
| |
| |---f: Filter[bag] - scope-42
| | |
| | Less Than[boolean] - scope-45
| | |
| | |---Project[int][1] - scope-43
| | |
| | |---Constant(2) - scope-44
|
|---c: New For Each(false,false)[bag] - scope-24
| |
| Cast[int] - scope-19
| |
| |---Project[bytearray][0] - scope-18
| |
| Cast[int] - scope-22
| |
| |---Project[bytearray][1] - scope-21
|
|---c: Load(file:///tmp/input3:org.apache.pig.builtin.PigStorage) - scope-17
Tez vertex scope-57
# Plan on vertex
h: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-54
|
|---h: FRJoin[tuple] - scope-48 <- scope-61
| |
| Project[int][0] - scope-46
| |
| Project[int][0] - scope-47
|
|---g: FRJoin[tuple] - scope-34 <- scope-61
| |
| Project[int][0] - scope-32
| |
| Project[int][0] - scope-33
|
|---POShuffledValueInputTez - scope-58 <- [scope-55, scope-56]
{code}
> MultiQueryOptimizerTez creates bad plan with union, split and FRJoin
> --------------------------------------------------------------------
>
> Key: PIG-5033
> URL: https://issues.apache.org/jira/browse/PIG-5033
> Project: Pig
> Issue Type: Bug
> Components: tez
> Affects Versions: 0.16.0
> Reporter: Travis Woodruff
>
> This script produces incorrect results:
> {code}
> a = load 'file:///tmp/input1' as (x:int, y:int);
> b = load 'file:///tmp/input2' as (x:int, y:int);
> u = union a,b;
> c = load 'file:///tmp/input3' as (x:int, y:int);
> e = filter c by y > 3;
> f = filter c by y < 2;
> g = join u by x left, e by x using 'replicated';
> h = join g by u::x left, f by x using 'replicated';
> store h into 'file:///tmp/pigoutput';
> {code}
> Without the union, or with opt.multiquery=false, or with non-replicated
> joins, it works as expected.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)