Yingda Chen created TEZ-4000:
--------------------------------

             Summary: Enable downstream vertex connecting to an EPHEMERAL data 
source, to reason about network connections of upstream tasks.
                 Key: TEZ-4000
                 URL: https://issues.apache.org/jira/browse/TEZ-4000
             Project: Apache Tez
          Issue Type: Task
            Reporter: Yingda Chen


This is an umbrella task for TEZ-3997

Another property that is usually shared with CONCURRENT on the same edge is 
EPHEMERAL data source. When two vertices are running concurrently, direct 
communications between tasks in those vertices become possible, and oftentimes 
necessary, throughout the lifetime of the running task. This can be articulated 
by an EPHEMERAL data sources, and this change aims to support such scenarios, 
which are readily found in real-time applications (such as interactive query) 
and/or customized applications that would like to control their own data 
communications (such as parameter-server).

 

This change will allow Tez to be the central orchestrator that gathers 
necessary network information from all upstream tasks, compiles them together 
and send it to downstream tasks. Particularly, the following changes are 
planned:
 # For two vertices connected via an edge with both CONCURRENT scheduling type 
and EPHEMERAL data source type, the task in upstream vertex will open network 
port, and send an VertexManagerEvent(VME) immediately upon running. The payload 
of VME includes necessary information to communicate to this task through 
direct network communication (such as ip and port). The vertex manager of the 
downstream vertex, typed VertexManagerWithConcurrentInputs, will receive these 
VMEs, and are responsible for aggregate (including de-dup if necessary) all 
information together in onVertexManagerEventReceived(). 
 # Once all VMEs have been received, a CustomProcessorEvent will be constructed 
with a payload that includes the aggregated information, and be routed to 
downstream tasks.

The change will introduce additional optional entries in 
VertexManagerEventPayload and a new custom payload that will be embedded into 
CustomProcessorEvent. 

 

Upon completion of functional feature in this change, additional feature such 
as handling of failover in CONCURRENT/EPHEMERAL edge will be addressed in 
future umbrea JIRAs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to