[
https://issues.apache.org/jira/browse/PIG-4690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rohini Palaniswamy updated PIG-4690:
------------------------------------
Assignee: Rohini Palaniswamy
Fix Version/s: 0.16.0
Description:
Found this while trying to get a reproducible script for a different issue. Not
a user reported one, but a possibility nonetheless.
A = LOAD 'x';
B = LOAD 'y';
C = UNION A, B;
D = JOIN C, A using 'repl';
DUMP D;
was:
Found this while trying to get a reproducible script for a different issue. Not
a user reported one as probability of such a script is less. Script to
reproduce.
A = LOAD 'x';
B = LOAD 'y';
C = UNION A, B;
D = JOIN C, A using 'repl';
DUMP D;
Real world scripts most likely have some foreach or filter statements before
the self-join.
Had another user reported issue where user had cross + scalar from same
Split. The script was actually wrong as it was using . instead of :: operator.
But since both inputs had 1 record, the script worked producing the right
results. it The script failed to run in Tez with
"java.lang.IllegalArgumentException: bound must be positive" as parallelism of
the cross vertex was set to 0. Main problem was not parallelism estimation,
but the planning where the shuffle vertex for Cross was overwritten with the
broadcast vertex for scalar.
Multi-query planning should differentiate between POPackage inputs and
non-POPackage inputs. Splittee should be merged if it has either POPackage or a
non-POPackage input and not both.
> Union with self replicate join will fail in Tez
> -----------------------------------------------
>
> Key: PIG-4690
> URL: https://issues.apache.org/jira/browse/PIG-4690
> Project: Pig
> Issue Type: Bug
> Reporter: Rohini Palaniswamy
> Assignee: Rohini Palaniswamy
> Fix For: 0.16.0
>
>
> Found this while trying to get a reproducible script for a different issue.
> Not a user reported one, but a possibility nonetheless.
> A = LOAD 'x';
> B = LOAD 'y';
> C = UNION A, B;
> D = JOIN C, A using 'repl';
> DUMP D;
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)