[
https://issues.apache.org/jira/browse/PIG-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rohini Palaniswamy updated PIG-4574:
------------------------------------
Attachment: PIG-4574-1.patch
Review board link - https://reviews.apache.org/r/35491/
Reading orderby/skewed join data from HDFS in Partitioner vertex, instead of
getting from sampler vertex.
This jira does not optimize the case of
A = LOAD 'x' ...;
B = LOAD 'y' ...;
C = UNION A, B;
D = ORDER C BY ..;
This depends on UnionOptimizer being turned on and will need more changes. So
will leave this for another jira.
> Eliminate identity vertex for order by and skewed join right after LOAD
> -----------------------------------------------------------------------
>
> Key: PIG-4574
> URL: https://issues.apache.org/jira/browse/PIG-4574
> Project: Pig
> Issue Type: Sub-task
> Components: tez
> Reporter: Rohini Palaniswamy
> Fix For: 0.16.0
>
> Attachments: PIG-4574-1.patch
>
>
> If ORDER BY or SKEWED JOIN is the operator immediately following LOAD+FOREACH
> without any FILTER, then data should be read again from HDFS for the
> partitioner vertex instead of writing to a identity vertex and reading from
> it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)