[ 
https://issues.apache.org/jira/browse/PIG-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-4574:
------------------------------------
    Attachment: PIG-4574-1.patch

Review board link - https://reviews.apache.org/r/35491/

Reading orderby/skewed join data from HDFS in Partitioner vertex, instead of 
getting from sampler vertex.

This jira does not optimize the case of 

A = LOAD 'x' ...;
B = LOAD 'y' ...;
C = UNION A, B;
D = ORDER C BY ..;

This depends on UnionOptimizer being turned on and will need more changes. So 
will leave this for another jira.

> Eliminate identity vertex for order by and skewed join right after LOAD
> -----------------------------------------------------------------------
>
>                 Key: PIG-4574
>                 URL: https://issues.apache.org/jira/browse/PIG-4574
>             Project: Pig
>          Issue Type: Sub-task
>          Components: tez
>            Reporter: Rohini Palaniswamy
>             Fix For: 0.16.0
>
>         Attachments: PIG-4574-1.patch
>
>
> If ORDER BY or SKEWED JOIN is the operator immediately following LOAD+FOREACH 
> without any FILTER, then data should be read again from HDFS for the 
> partitioner vertex instead of writing to a identity vertex and reading from 
> it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to