[
https://issues.apache.org/jira/browse/PIG-3850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Daniel Dai updated PIG-3850:
----------------------------
Fix Version/s: (was: tez-branch)
0.14.0
> Optimize join followed by order by using same key
> -------------------------------------------------
>
> Key: PIG-3850
> URL: https://issues.apache.org/jira/browse/PIG-3850
> Project: Pig
> Issue Type: Sub-task
> Components: tez
> Reporter: Rohini Palaniswamy
> Fix For: 0.14.0
>
>
> Possible optimizations:
> 1) If it is a skewed join, then we can combine ordering into it instead
> of doing a additional orderby as we skewed join already involves sampling.
> 2) If it is a normal join, then we can do the order by and then join. i.e
> Current plan:
> Vertex 1 (load massive), Vertex 2 (load big) -> Vertex 3 (join) -> Vertex 4
> (sampler), Vertex 5 (Partitioner using vertex 4 sample) -> Vertex 6 (order by)
> New plan:
> Vertex 1 (load massive) -> Vertex 2 (sampler), Vertex 3 (Partitioner using
> vertex 2 sample) -> Vertex 4 (order by and join) <- Vertex 5 (load big and
> construct WeightedRangePartitioner from Vertex 2 sample)
> 3) If it is a replicated join, similar plan in 2) should work with Vertex
> 5 changing to broadcast input to Vertex 4 instead of using
> WeightedRangePartitioner.
--
This message was sent by Atlassian JIRA
(v6.2#6252)