[ 
https://issues.apache.org/jira/browse/PIG-4690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4690:
------------------------------------
         Assignee: Rohini Palaniswamy
    Fix Version/s: 0.16.0
      Description: 
Found this while trying to get a reproducible script for a different issue. Not 
a user reported one, but a possibility nonetheless. 

A = LOAD 'x';
B = LOAD 'y';
C = UNION A, B;
D = JOIN C, A using 'repl';
DUMP D;

  was:
Found this while trying to get a reproducible script for a different issue. Not 
a user reported one as probability of such a script is less. Script to 
reproduce. 

A = LOAD 'x';
B = LOAD 'y';
C = UNION A, B;
D = JOIN C, A using 'repl';
DUMP D;

Real world scripts most likely have some foreach or filter statements before 
the self-join.


  Had another user reported issue where user had cross + scalar from same 
Split. The script was actually wrong as it was using . instead of :: operator.  
But since both inputs had 1 record, the script worked producing the right 
results. it The script failed to run in Tez with 
"java.lang.IllegalArgumentException: bound must be positive"  as parallelism of 
the cross vertex was set to 0.  Main problem was not parallelism estimation,  
but the planning where the shuffle vertex for Cross was overwritten with the 
broadcast vertex for scalar.

Multi-query planning should differentiate between POPackage inputs and 
non-POPackage inputs. Splittee should be merged if it has either POPackage or a 
non-POPackage input and not both.  



> Union with self replicate join will fail in Tez
> -----------------------------------------------
>
>                 Key: PIG-4690
>                 URL: https://issues.apache.org/jira/browse/PIG-4690
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.16.0
>
>
> Found this while trying to get a reproducible script for a different issue. 
> Not a user reported one, but a possibility nonetheless. 
> A = LOAD 'x';
> B = LOAD 'y';
> C = UNION A, B;
> D = JOIN C, A using 'repl';
> DUMP D;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to