[
https://issues.apache.org/jira/browse/TEZ-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14680721#comment-14680721
]
Siddharth Seth edited comment on TEZ-2003 at 8/10/15 9:09 PM:
--------------------------------------------------------------
bq. logErrorIngored, hearbeats, getCurretnDagName
bq. - remove “*” e,g, import org.apache.tez.common.asterisk;
Captured in TEZ-2678
bq. abortTask vs close/cleanup
Will check the code. abortTask should try cleaning up in both of them.
bq. TezTaskRunner2
killTask isn't used yet within Tez, which is why it's not informing the AM.
When task preemption comes in - the flow is likely to be a killTask invoked as
a result of an RPC, at which point the AM already knows that the task is killed
since it took the decision.
On the various atomic gets - there's separate variables to track what states
have been set, and is used in the return result. Atomicity of the entire
operation is handled via synchronization blocks.
TaskRunner handling containerStop is a result of containerStop coming over a
shared Task/Container protocol - which is linked to the running task. It could
be separated, but I think that'll need the protocols to be separated as well.
canCommit during a shutdown - will change this. I'll also verify what the
TaskRunner behaviour was. TEZ-2678
bq. TaskReporter
I don't think shutdown needs synchronization. It modifies a final variable.
Whether it's implemented correctly needs more investigation. It's the same as
what exists on master.
bq. ShuffleHandler
This is essentially the shuffle handler that is used in regular clusters. It's
not meant as a benchmark tool. Using he current shuffle mechanics seems like
the simplest mechanism to have jobs work with the standard set of
Inputs/Outputs which write to disk.
bq. ext-service-tests
Agree with making this a reference for ext services. It would need to implement
the APIs better, and be documented a lot bette to serve this purpose. Creating
a new jira to track this - TEZ-2705. Post merge ?
bq. JoinValidate
The changes are for private use, to be able to re-use the example in testing.
Will add docs to mention this.
bq. TezTaskCommunicatorImpl
Using payloads wherever possible - including internal plugins. Avoided in
LocalContainerLauncher only at the moment, where a lot of runtime AM
information is used.
Will fix isKnownContainer and containerAlive t be based on specific
communicator.
Renaming methods in TaskComm - tracked in the TaskComm enhancements jira
getDagName null - will try improving this.
getVertexName - I'm not sure there's a lot that can be done. TezException
instead of NPE ? Eventually this will lead to an error in the plugin, which
needs to be handled better. There's a jira to track such error handling.
onStateUpdated - is the AM telling the TaskCommunicator plugin that a vertex
has changed state. Similar to what is done elsewhere - like the
InputInitializers.
dagCompleteStart - couldn't find this. Maybe I removed it at some point for the
same reason - is a very confusing name.
bq. Is there a need for the framework to make updates into the Context object?
If yes, should the Context implement 2 interfaces? Should the internal objects
just bind to the internal Impl objects or are they bound to the public plugin
interfaces to catch compat errors? Binding to Impls directly may mean a smaller
public API interface.
Need more clarification on this comment.
bq. ctor.setAccessible(true);
Will do.
was (Author: sseth):
bq. logErrorIngored, hearbeats, getCurretnDagName
bq. - remove “*” e,g, import org.apache.tez.common.asterisk;
Captured in TEZ-2678
bq. abortTask vs close/cleanup
Will check the code. abortTask should try cleaning up in both of them.
bq. TezTaskRunner2
killTask isn't used yet within Tez, which is why it's not informing the AM.
When task preemption comes in - the flow is likely to be a killTask invoked as
a result of an RPC, at which point the AM already knows that the task is killed
since it took the decision.
On the various atomic gets - there's separate variables to track what states
have been set, and is used in the return result. Atomicity of the entire
operation is handled via synchronization blocks.
TaskRunner handling containerStop is a result of containerStop coming over a
shared Task/Container protocol - which is linked to the running task. It could
be separated, but I think that'll need the protocols to be separated as well.
canCommit during a shutdown - will change this. I'll also verify what the
TaskRunner behaviour was. TEZ-2678
bq. TaskReporter
I don't think shutdown needs synchronization. It modifies a final variable.
Whether it's implemented correctly needs more investigation. It's the same as
what exists on master.
bq. ShuffleHandler
This is essentially the shuffle handler that is used in regular clusters. It's
not meant as a benchmark tool. Using he current shuffle mechanics seems like
the simplest mechanism to have jobs work with the standard set of
Inputs/Outputs which write to disk.
bq. ext-service-tests
Agree with making this a reference for ext services. It would need to implement
the APIs better, and be documented a lot bette to serve this purpose. Creating
a new jira to track this - TEZ-2705. Post merge ?
bq. JoinValidate
The changes are for private use, to be able to re-use the example in testing.
Will add docs to mention this.
bq. TezTaskCommunicatorImpl
Using payloads wherever possible - including internal plugins. Avoided in
LocalContainerLauncher only at the moment, where a lot of runtime AM
information is used.
Will fix isKnownContainer and containerAlive t be based on specific
communicator.
Renaming methods in TaskComm - tracked in the TaskComm enhancements jira
getDagName null - will try improving this.
getVertexName - I'm not sure there's a lot that can be done. TezException
instead of NPE ? Eventually this will lead to an error in the plugin, which
needs to be handled better. There's a jira to track such error handling.
onStateUpdated - is the AM telling the TaskCommunicator plugin that a vertex
has changed state. Similar to what is done elsewhere - like the
InputInitializers.
dagCompleteStart - couldn't find this. Maybe I removed it at some point for the
same reason - is a very confusing name.
bq. Is there a need for the framework to make updates into the Context object?
If yes, should the Context implement 2 interfaces? Should the internal objects
just bind to the internal Impl objects or are they bound to the public plugin
interfaces to catch compat errors? Binding to Impls directly may mean a smaller
public API interface.
Need more clarification on this comment.
bq. Is there a need for the framework to make updates into the Context object?
If yes, should the Context implement 2 interfaces? Should the internal objects
just bind to the internal Impl objects or are they bound to the public plugin
interfaces to catch compat errors? Binding to Impls directly may mean a smaller
public API interface.
Will do.
> [Umbrella] Allow Tez to co-ordinate execution to external services
> ------------------------------------------------------------------
>
> Key: TEZ-2003
> URL: https://issues.apache.org/jira/browse/TEZ-2003
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Siddharth Seth
> Attachments: 2003_20150728.1.txt, 2003_20150807.1.txt,
> 2003_20150807.2.txt, Tez With External Services.pdf
>
>
> The Tez engine itself takes care of co-ordinating execution - controlling how
> data gets routed (different connection patterns), fault tolerance, scheduling
> of work, etc.
> This is currently tied to TaskSpecs defined within Tez and on containers
> launched by Tez itself (TezChild).
> The proposal is to allow Tez to work with external services instead of just
> containers launched by Tez. This involves several more pluggable layers to
> work with alternate Task Specifications, custom launch and task allocation
> mechanics, as well as custom scheduling sources.
> A simple example would be a simple a process with the capability to execute
> multiple Tez TaskSpecs as threads. In such a case, a container launch isn't
> really need and can be mocked. Sourcing / scheduling containers would need to
> be pluggable.
> A more advanced example would be LLAP (HIVE-7926;
> https://issues.apache.org/jira/secure/attachment/12665704/LLAPdesigndocument.pdf).
> This works with custom interfaces - which would need to be supported by Tez,
> along with a custom event model which would need translation hooks.
> Tez should be able to work with a combination of certain vertices running in
> external services and others running in regular Tez containers.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)