[
https://issues.apache.org/jira/browse/PIG-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15507776#comment-15507776
]
Travis Woodruff commented on PIG-5033:
--------------------------------------
And here's the plan when I remove the union (which gives correct results). The
difference seems to be that it leaves one of the join right-side inputs in a
separate vertex.
{code}
Tez vertex scope-47 -> Tez vertex scope-46,Tez vertex scope-51,
Tez vertex scope-51 -> Tez vertex scope-46,
Tez vertex scope-46
Tez vertex scope-47
# Plan on vertex
c: Split - scope-53
| |
| Local Rearrange[tuple]{int}(false) - scope-28 -> scope-46
| | |
| | Project[int][0] - scope-24
| |
| |---e: Filter[bag] - scope-19
| | |
| | Greater Than[boolean] - scope-22
| | |
| | |---Project[int][1] - scope-20
| | |
| | |---Constant(3) - scope-21
| |
| POValueOutputTez - scope-48 -> [scope-51]
|
|---c: New For Each(false,false)[bag] - scope-15
| |
| Cast[int] - scope-10
| |
| |---Project[bytearray][0] - scope-9
| |
| Cast[int] - scope-13
| |
| |---Project[bytearray][1] - scope-12
|
|---c: Load(file:///tmp/input3:org.apache.pig.builtin.PigStorage) - scope-8
Tez vertex scope-51
# Plan on vertex
Local Rearrange[tuple]{int}(false) - scope-42 -> scope-46
| |
| Project[int][0] - scope-38
|
|---f: Filter[bag] - scope-33
| |
| Less Than[boolean] - scope-36
| |
| |---Project[int][1] - scope-34
| |
| |---Constant(2) - scope-35
|
|---POValueInputTez - scope-52 <- scope-47
Tez vertex scope-46
# Plan on vertex
h: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-45
|
|---h: FRJoin[tuple] - scope-39 <- scope-51
| |
| Project[int][0] - scope-37
| |
| Project[int][0] - scope-38
|
|---g: FRJoin[tuple] - scope-25 <- scope-47
| |
| Project[int][0] - scope-23
| |
| Project[int][0] - scope-24
|
|---a: New For Each(false,false)[bag] - scope-7
| |
| Cast[int] - scope-2
| |
| |---Project[bytearray][0] - scope-1
| |
| Cast[int] - scope-5
| |
| |---Project[bytearray][1] - scope-4
|
|---a: Load(file:///tmp/input1:org.apache.pig.builtin.PigStorage) -
scope-0
{code}
> MultiQueryOptimizerTez creates bad plan with union, split and FRJoin
> --------------------------------------------------------------------
>
> Key: PIG-5033
> URL: https://issues.apache.org/jira/browse/PIG-5033
> Project: Pig
> Issue Type: Bug
> Components: tez
> Affects Versions: 0.16.0
> Reporter: Travis Woodruff
>
> This script produces incorrect results:
> {code}
> a = load 'file:///tmp/input1' as (x:int, y:int);
> b = load 'file:///tmp/input2' as (x:int, y:int);
> u = union a,b;
> c = load 'file:///tmp/input3' as (x:int, y:int);
> e = filter c by y > 3;
> f = filter c by y < 2;
> g = join u by x left, e by x using 'replicated';
> h = join g by u::x left, f by x using 'replicated';
> store h into 'file:///tmp/pigoutput';
> {code}
> Without the union, or with opt.multiquery=false, or with non-replicated
> joins, it works as expected.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)