[
https://issues.apache.org/jira/browse/TEZ-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14693953#comment-14693953
]
Siddharth Seth commented on TEZ-2003:
-------------------------------------
bq. All the new service plugins should be in runtime-api package where there
rest of the user defined plugins are currently placed. They should not be in
DAG API which deal with defining the DAG structure (as opposed to pluggable
user components)
They're not in dag api. There's a separate package called serviceplugin.api
where they reside. They don't belong in runtime-api, which is primarily for
Inputs, Outputs, Processor
bq. executeInContainer should be part of the default execution context that is
created by the user (for other frameworks) or created internally by TezClient
or the DAGAppMaster if no default is specified. That way things continue to run
on YARN as is.
That's already the case. executeInContainer is in there for cases where the dag
level defaults are set to something else, and a specific vertex needs to be
executed in containers.
bq. This change for hybrid execution is a fundamental and important change for
Tez. ...
The changes are limited to the DAGAppMaser, Client and controllers for the
individual plugins, and are not spread all over the place. It definitely makes
testing easier - at least the way the tests are structured rightnow, and have
already been changed. We can think of changing this at a later point, to setup
everything in the DAGAppMaster. Changing the tests is the tiresome part there.
bq. Until now, there wasnt any special logic in VertexImpl for local mode. ...
Earlier there were no plugins. The default scheduler that was setup is the one
that would end up getting used. Now a schedulerId needs to be sent along with
request events (as well as launcher and taksComm ids). For local mode - this
would always be 0. For regular container mode, it would also be 0. However
there would be no checks while sending this out.
The execution context is not sent along with the payload from the client if it
is not specified. That's only available in the DAGPlan and VertexPlan - which
is why it's read in DAG and Vertex. DAGAppMaster should not be in the business
of parsing and understanding plan bits which are not relevant to it. VertexImpl
seems to be the right place to handle override handling.
bq. Similar creating a SchedulingPlugin object is an example of defensive
programming where we pass around this object which has semantic meaning instead
of passing around int types. Sure, entities which need 2 out of 3 can choose to
use only the getters of 2 out 3. But essentially tracking that object allows us
to clearly see which parts of the code/events are related to plugins and which
parts are not. Tracking ints does not provide that visibility.
I think that's far more unsafe, when a method exposes all three - but only 1 of
the three may have been set. The current approach is very explicit with what
needs to be set (and hence available) and what does not.
bq. An uber comment on uber mode is that it seems dangerous to run uber mode
tasks within the AM.
I think some more work is required for uber mode before it's officially
supported. I can see both modes working - within the AM process, or as a
sub-process. In any case - I'm going to call this out in the Javadocs, untill
additional work is done to formalize support for uber mode.
bq. Not sure how this is fixed. Here is the code fragment from my initial
comment. In some places we are arbitrarily passing back schedulerId = 0.
The 0s were bugs. The jira to handle different nodes was fixed. TEZ-2707 fixed
the 0s and TEZ-2313 added tests for this.
bq. Please take a look at the MockDAGAppMaster code. numUpdates is used
internally by that code. So the increment is still needed.
Will do.
bq. What will happen if the dag starts to run/launch new tasks while the
communicator is still procesing the completion of the previous dag? Say launch
on communicator will be invoked. To process the launch it may call getDAG()
which will then either return the wrong dag or stuck (or deadlocked?) behind
the dagChangedReadLock?
The plugins are informed of DAG completion, and should be written in a way to
handle this. I don't think there's a lot more that can be done to protect
against updates coming in from an old DAG, while a new DAG has been submitted.
bq. Not sure why SchedulerEvent/ContainerEvent base classes would cause
complications. Every scheduler event now needs a scheduler id. So every new
event needs to have that specified. So a base class that keeps that code in one
place sounds like a transparent change.
SchedulerEvents fixed in 2707. I was likely confusing the complexity with
AMNodeEvents which had been changed earlier. Container events don't need this.
Only a single event - the launch request - actually contains any details about
the plugins.
> [Umbrella] Allow Tez to co-ordinate execution to external services
> ------------------------------------------------------------------
>
> Key: TEZ-2003
> URL: https://issues.apache.org/jira/browse/TEZ-2003
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Siddharth Seth
> Attachments: 2003_20150728.1.txt, 2003_20150807.1.txt,
> 2003_20150807.2.txt, Tez With External Services.pdf
>
>
> The Tez engine itself takes care of co-ordinating execution - controlling how
> data gets routed (different connection patterns), fault tolerance, scheduling
> of work, etc.
> This is currently tied to TaskSpecs defined within Tez and on containers
> launched by Tez itself (TezChild).
> The proposal is to allow Tez to work with external services instead of just
> containers launched by Tez. This involves several more pluggable layers to
> work with alternate Task Specifications, custom launch and task allocation
> mechanics, as well as custom scheduling sources.
> A simple example would be a simple a process with the capability to execute
> multiple Tez TaskSpecs as threads. In such a case, a container launch isn't
> really need and can be mocked. Sourcing / scheduling containers would need to
> be pluggable.
> A more advanced example would be LLAP (HIVE-7926;
> https://issues.apache.org/jira/secure/attachment/12665704/LLAPdesigndocument.pdf).
> This works with custom interfaces - which would need to be supported by Tez,
> along with a custom event model which would need translation hooks.
> Tez should be able to work with a combination of certain vertices running in
> external services and others running in regular Tez containers.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)