Yingda Chen created TEZ-4000:
--------------------------------
Summary: Enable downstream vertex connecting to an EPHEMERAL data
source, to reason about network connections of upstream tasks.
Key: TEZ-4000
URL: https://issues.apache.org/jira/browse/TEZ-4000
Project: Apache Tez
Issue Type: Task
Reporter: Yingda Chen
This is an umbrella task for TEZ-3997
Another property that is usually shared with CONCURRENT on the same edge is
EPHEMERAL data source. When two vertices are running concurrently, direct
communications between tasks in those vertices become possible, and oftentimes
necessary, throughout the lifetime of the running task. This can be articulated
by an EPHEMERAL data sources, and this change aims to support such scenarios,
which are readily found in real-time applications (such as interactive query)
and/or customized applications that would like to control their own data
communications (such as parameter-server).
This change will allow Tez to be the central orchestrator that gathers
necessary network information from all upstream tasks, compiles them together
and send it to downstream tasks. Particularly, the following changes are
planned:
# For two vertices connected via an edge with both CONCURRENT scheduling type
and EPHEMERAL data source type, the task in upstream vertex will open network
port, and send an VertexManagerEvent(VME) immediately upon running. The payload
of VME includes necessary information to communicate to this task through
direct network communication (such as ip and port). The vertex manager of the
downstream vertex, typed VertexManagerWithConcurrentInputs, will receive these
VMEs, and are responsible for aggregate (including de-dup if necessary) all
information together in onVertexManagerEventReceived().
# Once all VMEs have been received, a CustomProcessorEvent will be constructed
with a payload that includes the aggregated information, and be routed to
downstream tasks.
The change will introduce additional optional entries in
VertexManagerEventPayload and a new custom payload that will be embedded into
CustomProcessorEvent.
Upon completion of functional feature in this change, additional feature such
as handling of failover in CONCURRENT/EPHEMERAL edge will be addressed in
future umbrea JIRAs.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)