[ https://issues.apache.org/jira/browse/TEZ-4000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yingda Chen reassigned TEZ-4000: -------------------------------- Assignee: Yingda Chen > Enable downstream vertex connecting to an EPHEMERAL data source, to reason > about network connections of upstream tasks. > ----------------------------------------------------------------------------------------------------------------------- > > Key: TEZ-4000 > URL: https://issues.apache.org/jira/browse/TEZ-4000 > Project: Apache Tez > Issue Type: Task > Reporter: Yingda Chen > Assignee: Yingda Chen > Priority: Major > > This is an umbrella task for TEZ-3997 > Another property that is usually shared with CONCURRENT on the same edge is > EPHEMERAL data source. When two vertices are running concurrently, direct > communications between tasks in those vertices become possible, and > oftentimes necessary, throughout the lifetime of the running task. This can > be articulated by an EPHEMERAL data sources, and this change aims to support > such scenarios, which are readily found in real-time applications (such as > interactive query) and/or customized applications that would like to control > their own data communications (such as parameter-server). > > This change will allow Tez to be the central orchestrator that gathers > necessary network information from all upstream tasks, compiles them together > and send it to downstream tasks. Particularly, the following changes are > planned: > # For two vertices connected via an edge with both CONCURRENT scheduling > type and EPHEMERAL data source type, the task in upstream vertex will open > network port, and send an VertexManagerEvent(VME) immediately upon running. > The payload of VME includes necessary information to communicate to this task > through direct network communication (such as ip and port). The vertex > manager of the downstream vertex, typed VertexManagerWithConcurrentInputs, > will receive these VMEs, and are responsible for aggregate (including de-dup > if necessary) all information together in onVertexManagerEventReceived(). > # Once all VMEs have been received, a CustomProcessorEvent will be > constructed with a payload that includes the aggregated information, and be > routed to downstream tasks. > The change will introduce additional optional entries in > VertexManagerEventPayload and a new custom payload that will be embedded into > CustomProcessorEvent. > > Upon completion of functional feature in this change, additional feature such > as handling of failover in CONCURRENT/EPHEMERAL edge will be addressed in > future umbrea JIRAs. -- This message was sent by Atlassian JIRA (v7.6.3#76005)