StephanEwen opened a new pull request #10483: Operator coordinators
URL: https://github.com/apache/flink/pull/10483
 
 
   ## What is the purpose of the change
   
   This PR introduces Operator Coordinators, as a part of 
[FLIP-27](https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface)
   
   Operator Coordinators are instances that exist once per operator. While the 
operators run on the TaskManagers, the coordinator runs on the JobManager. The 
coordinator communicates via events with the operators, typically to assign 
work.
   
   The first user for those coordinators would be the new source interface. The 
OperatorCoordinator will run the Source's Split Enumerator.
   This change will also allow us to remove *InputSplits* and 
*intializeOnMaster* / *finalizeOnMaster* logic in a future step.
   
   Further users we envision are sinks (for coordinated commits of metadata), 
or iterations (gather progress and coordinate supersteps) as well as simple 
approximate alignments between streams (event time alignment).
   
   ## Brief change log
   
     - Introduce the `OperatorCoordinator` interface.
     - Add a way to attach a factory (`Provider`) for the Coordinator to the 
JobVertex of the JobGraph.
     - Integrate the OperatorCoordinator` with the `ExecutionJobVertex` and the 
new scheduler (note, integration with the legacy scheduler is not planned)
     - Add the `OperatorEvent` and support for sending sending bidirectional 
events between Coordinator and Operator.
     - On the TaskManager / runtime side, Operators register themselves at the 
`OperatorEventDispatcher` and obtain a Gateway to send events. That way, 
operators are not (more strongly than already) tied to the heavyweight 
`Environment` object.
   
   ## Verifying this change
   
   This change is internal only so far (a building block for other features, 
like the new Source API).
   The change is tested mainly through some units tests, most importantly
   `flink-runtime : 
org.apache.flink.runtime.scheduler.OperatorCoordinatorSchedulerTest.java`
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): **no**
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: **no**
     - The serializers: **no**
     - The runtime per-record code paths (performance sensitive): **no**
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: **yes**
     - The S3 file system connector: **no**
   
   ## Documentation
   
     - Does this pull request introduce a new feature? **internal feature**
     - If yes, how is the feature documented? **not applicable** *(docs will be 
added for the new source interface)*

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to