[jira] [Comment Edited] (TEZ-2003) [Umbrella] Allow Tez to co-ordinate execution to external services

Siddharth Seth (JIRA) Mon, 10 Aug 2015 14:10:23 -0700

    [ 
https://issues.apache.org/jira/browse/TEZ-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14680721#comment-14680721
 ]


Siddharth Seth edited comment on TEZ-2003 at 8/10/15 9:09 PM:
--------------------------------------------------------------

bq. logErrorIngored, hearbeats, getCurretnDagName
bq. - remove “*” e,g, import org.apache.tez.common.asterisk; 
Captured in TEZ-2678

bq. abortTask vs close/cleanup
Will check the code. abortTask should try cleaning up in both of them.

bq. TezTaskRunner2
killTask isn't used yet within Tez, which is why it's not informing the AM. 
When task preemption comes in - the flow is likely to be a killTask invoked as 
a result of an RPC, at which point the AM already knows that the task is killed 
since it took the decision.

On the various atomic gets - there's separate variables to track what states 
have been set, and is used in the return result. Atomicity of the entire 
operation is handled via synchronization blocks.

TaskRunner handling containerStop is a result of containerStop coming over a 
shared Task/Container protocol - which is linked to the running task. It could 
be separated, but I think that'll need the protocols to be separated as well.

canCommit during a shutdown - will change this. I'll also verify what the 
TaskRunner behaviour was. TEZ-2678

bq. TaskReporter
I don't think shutdown needs synchronization. It modifies a final variable. 
Whether it's implemented correctly needs more investigation. It's the same as 
what exists on master.

bq. ShuffleHandler
This is essentially the shuffle handler that is used in regular clusters. It's 
not meant as a benchmark tool. Using he current shuffle mechanics seems like 
the simplest mechanism to have jobs work with the standard set of 
Inputs/Outputs which write to disk.

bq. ext-service-tests
Agree with making this a reference for ext services. It would need to implement 
the APIs better, and be documented a lot bette to serve this purpose. Creating 
a new jira to track this - TEZ-2705. Post merge ?

bq. JoinValidate
The changes are for private use, to be able to re-use the example in testing. 
Will add docs to mention this.

bq. TezTaskCommunicatorImpl
Using payloads wherever possible - including internal plugins. Avoided in 
LocalContainerLauncher only at the moment, where a lot of runtime AM 
information is used.

Will fix isKnownContainer and containerAlive t be based on specific 
communicator.

Renaming methods in TaskComm - tracked in the TaskComm enhancements jira

getDagName null - will try improving this.
getVertexName - I'm not sure there's a lot that can be done. TezException 
instead of NPE ? Eventually this will lead to an error in the plugin, which 
needs to be handled better. There's a jira to track such error handling.

onStateUpdated - is the AM telling the TaskCommunicator plugin that a vertex 
has changed state. Similar to what is done elsewhere - like the 
InputInitializers.

dagCompleteStart - couldn't find this. Maybe I removed it at some point for the 
same reason - is a very confusing name.

bq. Is there a need for the framework to make updates into the Context object? 
If yes, should the Context implement 2 interfaces? Should the internal objects 
just bind to the internal Impl objects or are they bound to the public plugin 
interfaces to catch compat errors? Binding to Impls directly may mean a smaller 
public API interface.
Need more clarification on this comment.

bq. ctor.setAccessible(true);
Will do. 


was (Author: sseth):
bq. logErrorIngored, hearbeats, getCurretnDagName
bq. - remove “*” e,g, import org.apache.tez.common.asterisk; 
Captured in TEZ-2678

bq. abortTask vs close/cleanup
Will check the code. abortTask should try cleaning up in both of them.

bq. TezTaskRunner2
killTask isn't used yet within Tez, which is why it's not informing the AM. 
When task preemption comes in - the flow is likely to be a killTask invoked as 
a result of an RPC, at which point the AM already knows that the task is killed 
since it took the decision.

On the various atomic gets - there's separate variables to track what states 
have been set, and is used in the return result. Atomicity of the entire 
operation is handled via synchronization blocks.

TaskRunner handling containerStop is a result of containerStop coming over a 
shared Task/Container protocol - which is linked to the running task. It could 
be separated, but I think that'll need the protocols to be separated as well.

canCommit during a shutdown - will change this. I'll also verify what the 
TaskRunner behaviour was. TEZ-2678

bq. TaskReporter
I don't think shutdown needs synchronization. It modifies a final variable. 
Whether it's implemented correctly needs more investigation. It's the same as 
what exists on master.

bq. ShuffleHandler
This is essentially the shuffle handler that is used in regular clusters. It's 
not meant as a benchmark tool. Using he current shuffle mechanics seems like 
the simplest mechanism to have jobs work with the standard set of 
Inputs/Outputs which write to disk.

bq. ext-service-tests
Agree with making this a reference for ext services. It would need to implement 
the APIs better, and be documented a lot bette to serve this purpose. Creating 
a new jira to track this - TEZ-2705. Post merge ?

bq. JoinValidate
The changes are for private use, to be able to re-use the example in testing. 
Will add docs to mention this.

bq. TezTaskCommunicatorImpl
Using payloads wherever possible - including internal plugins. Avoided in 
LocalContainerLauncher only at the moment, where a lot of runtime AM 
information is used.

Will fix isKnownContainer and containerAlive t be based on specific 
communicator.

Renaming methods in TaskComm - tracked in the TaskComm enhancements jira

getDagName null - will try improving this.
getVertexName - I'm not sure there's a lot that can be done. TezException 
instead of NPE ? Eventually this will lead to an error in the plugin, which 
needs to be handled better. There's a jira to track such error handling.

onStateUpdated - is the AM telling the TaskCommunicator plugin that a vertex 
has changed state. Similar to what is done elsewhere - like the 
InputInitializers.

dagCompleteStart - couldn't find this. Maybe I removed it at some point for the 
same reason - is a very confusing name.

bq. Is there a need for the framework to make updates into the Context object? 
If yes, should the Context implement 2 interfaces? Should the internal objects 
just bind to the internal Impl objects or are they bound to the public plugin 
interfaces to catch compat errors? Binding to Impls directly may mean a smaller 
public API interface.
Need more clarification on this comment.

bq. Is there a need for the framework to make updates into the Context object? 
If yes, should the Context implement 2 interfaces? Should the internal objects 
just bind to the internal Impl objects or are they bound to the public plugin 
interfaces to catch compat errors? Binding to Impls directly may mean a smaller 
public API interface.
Will do. 

> [Umbrella] Allow Tez to co-ordinate execution to external services
> ------------------------------------------------------------------
>
>                 Key: TEZ-2003
>                 URL: https://issues.apache.org/jira/browse/TEZ-2003
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>         Attachments: 2003_20150728.1.txt, 2003_20150807.1.txt, 
> 2003_20150807.2.txt, Tez With External Services.pdf
>
>
> The Tez engine itself takes care of co-ordinating execution - controlling how 
> data gets routed (different connection patterns), fault tolerance, scheduling 
> of work, etc.
> This is currently tied to TaskSpecs defined within Tez and on containers 
> launched by Tez itself (TezChild).
> The proposal is to allow Tez to work with external services instead of just 
> containers launched by Tez. This involves several more pluggable layers to 
> work with alternate Task Specifications, custom launch and task allocation 
> mechanics, as well as custom scheduling sources.
> A simple example would be a simple a process with the capability to execute 
> multiple Tez TaskSpecs as threads. In such a case, a container launch isn't 
> really need and can be mocked. Sourcing / scheduling containers would need to 
> be pluggable.
> A more advanced example would be LLAP (HIVE-7926; 
> https://issues.apache.org/jira/secure/attachment/12665704/LLAPdesigndocument.pdf).
> This works with custom interfaces - which would need to be supported by Tez, 
> along with a custom event model which would need translation hooks.
> Tez should be able to work with a combination of certain vertices running in 
> external services and others running in regular Tez containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (TEZ-2003) [Umbrella] Allow Tez to co-ordinate execution to external services

Reply via email to