[
https://issues.apache.org/jira/browse/TEZ-2118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327124#comment-14327124
]
Siddharth Seth commented on TEZ-2118:
-------------------------------------
Some more details.
This allows for a hybrid mode of execution - where some vertices may run in an
external service, while others run via regular containers.
Here’s some of the possible scenarios for running a single vertex, which could
be supported (eventually). Different vertices could use different modes.
||SchedulerSrc||Launcher||TaskComm||Scenario||
|Ext|Ext|Ext|External executor (e.g. LLAP)
|YARN|Regular|Regular|Current container execution, fault tolerance - nth
attempt|
|YARN|Ext|Ext|External executor, but scheduled via YARN|
|YARN|Ext2|Ext|ExtExecutor with custom TezChild equivalent|
Planning on modeling this as follows.
- AM startup needs to specify all entities that may be used (likely named
entities).
- DAGs are setup, and each vertex specifies the combination that they will use.
This is enhanced later to be setup via a policy.
- SchedulerManager (TSEH), ContainerLauncherManager and TaskCommManager get
enough information on each task / container about the entities to be used.
Essentially the data starts flowing from the point when a TaskAttempt is
scheduled – and reaches relevant components.
There’s implications on Vertex / DAG level entities – e.g. APIs like
getTotalAvailableResources – which will likely be based on the main scheduler
specified for a Vertex, DAGScheduler – doesn’t change for the duration of the
DAG, etc.
> Allow for the scheduler, launcher, task communicator to be specified per
> vertex
> -------------------------------------------------------------------------------
>
> Key: TEZ-2118
> URL: https://issues.apache.org/jira/browse/TEZ-2118
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Siddharth Seth
> Assignee: Siddharth Seth
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)