[
https://issues.apache.org/jira/browse/PIG-3846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996993#comment-13996993
]
Daniel Dai commented on PIG-3846:
---------------------------------
Summary of changes:
1. TezOperDependencyParallelismEstimator, estimate the number of parallelism
based on the parallelism of predecessors and operators within predecessors'
physical plan
2. PigOrderByVertexManager, VertexManagerPlugin for sort vertex of order by. It
receive event from partition node and decrease parallelism of sort vertex
automatically (TEZ-1107 prevent increase parallelism of sort job)
3. Change of POReservoirSample, FindQuantilesTez, WeightedRangePartitionerTez,
PigProcessor to assist PigOrderByVertexManager, FindQuantilesTez will estimate
numQuantiles based on the samples sent from POReservoirSample (include stats of
the previous job), WeightedRangePartitionerTez will partition the incoming data
into the estimated numQuantiles partitions, and PigProcessor will send
numQuantiles to PigOrderByVertexManager
4. Set auto-parallelism flag for ShuffleVertexManager to true for applicable
vertex
5. Add estimatedParallelism to TezOperator. If requestedParallelism is not set,
TezOperDependencyParallelismEstimator will estimate the parallelism and
instruct VertexManager to figure out parallelism dynamically
> Implement automatic reducer parallelism
> ---------------------------------------
>
> Key: PIG-3846
> URL: https://issues.apache.org/jira/browse/PIG-3846
> Project: Pig
> Issue Type: Sub-task
> Reporter: Rohini Palaniswamy
> Assignee: Daniel Dai
> Fix For: tez-branch
>
> Attachments: PIG-3846-1.patch, PIG-3846-3.patch
>
>
> Tez has it built-in. We can start with reusing it and then look at
> customization for better performance.
--
This message was sent by Atlassian JIRA
(v6.2#6252)