[ 
https://issues.apache.org/jira/browse/TEZ-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma updated TEZ-3269:
-------------------------
    Attachment: TEZ-3269.patch

Here is the draft patch. It supports two polices w.r.t. fair routing.

* One policy is auto reduce, somewhat similar to ShuffleVertexManager where the 
number of partitions can be reduced based on data size. Instead of using the 
overall size as in ShuffleVertexManager, it uses partition stats instead e.g. 
TEZ-2962.
* Another routing policy is fair routing. Any destination task can fetch a 
consecutive range of partitions from a consecutive range of source tasks. Note 
that the patch only supports one bipartite edge. To make it work for more than 
one bipartite edge requires more work. We can open another jira if we need to 
support that.

Besides the core routing functionalities, the patch also includes the 
followings.

* Move global stats to per source vertex. This will allow more accurate 
estimation of the partition size given one source vertex can be much larger 
than the others. In addition, change from long to int for stats as the unit is 
in MB. So there is some impact on memory even for ShuffleVertexManager. But the 
net impact should be acceptable. For joining say 20 source vertexes with 20k 
destination tasks, the size is 4 * 20k * 20 = 800k. If we want to be safe, we 
can make this change specific to FairShuffleVertexManager. But we might need it 
anyway for TEZ-2962.

* Refactor test code.
** The common test cases for both ShuffleVertexManager and 
FairShuffleVertexManager are moved to TestShuffleVertexManagerBase.
** TestShuffleVertexManager still verified EdgeManager via 
EdgeManagerPlugin#routeDataMovementEventToDestination. It should use 
EdgeManagerPluginOnDemand instead as that is what is actually being used.
** Break testShuffleVertexManagerAutoParallelism into individual test cases.

> Provide basic fair routing and scheduling functionality via custom 
> VertexManager and EdgeManager
> ------------------------------------------------------------------------------------------------
>
>                 Key: TEZ-3269
>                 URL: https://issues.apache.org/jira/browse/TEZ-3269
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Ming Ma
>         Attachments: TEZ-3269.patch
>
>
> With TEZ-3206 and TEZ-3216, we can build a custom VertexManager and 
> EdgeManager that uses partition stats to do fair routing as well as the 
> scheduling based on destination tasks’ dependency on source tasks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to