[GitHub] [flink] kl0u opened a new pull request #13556: Flink 19485 final

GitBox Wed, 07 Oct 2020 07:30:38 -0700


kl0u opened a new pull request #13556:
URL: https://github.com/apache/flink/pull/13556



   ## What is the purpose of the change
   
   Although `DataStream` is going to be the unified API for Batch and Streaming 
applications (see 
[FLIP-134](https://cwiki.apache.org/confluence/display/FLINK/FLIP-134%3A+Batch+execution+for+the+DataStream+API)
 ) , some operations, _e.g._ Sinks, may need to have different runtime 
implementations depending on if they are intended to run on bounded or 
unbounded data. This is not necessarily only for optimisations but also for the 
exposed semantics, _i.e._ correctness.
   
   So far, DataStream had a 1-to-1 mapping between an API call and an operator. 
In a sense, the DataStream API was an "explicit" API. With this addition, we 
will decouple the API calls from the actual runtime implementations of the 
operations and thus allow different operations to have more than one runtime 
implementations, depending (for now) on the `execution.runtime-mode`.
   
   In this PR we introduce the `StreamGraphTranslator` interface, which is the 
main new entity responsible for translating a `Transformation` into its runtime 
implementation, the framework based on which a developer can write a 
`StreamGraphTranslator` for a new `Transformation` and wire it to the 
`StreamGraphGenerator` and an example implementation for the 
`OneInputTransformation`.
   
   ## Brief change log
   
   The changes are mainly in the `StreamGraphGenerator` and the addition of the 
`StreamGraphTranslator`/`BaseSimpleTransformationTranslator`/`OneInputTranslator`.
   
   ## Verifying this change
   
   With the addition of the new `OneInputTranslator`, all existing tests 
(including e2e tests) using an `OneInputTransformation` also verify the changes 
in this PR.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): (yes / **no**)
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (yes / **no**)
     - The serializers: (yes / **no** / don't know)
     - The runtime per-record code paths (performance sensitive): (yes / **no** 
/ don't know)
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: (yes / **no** / 
don't know)
     - The S3 file system connector: (yes / **no** / don't know)
   
   ## Documentation
   
     - Does this pull request introduce a new feature? (yes / **no**)
     - If yes, how is the feature documented? (**not applicable** / docs / 
JavaDocs / not documented)
   
   **NOTE**: I would also like to move the `streaming.api.graph` to a new 
`streaming.runtime.graph` package as this seems more "correct" but this will 
fail because if we move the `StreamGraph`, then we will break binary 
compatibility because of the `LocalStreamEnvironment.execute(streamGraph)` and 
`RemoteStreamEnvironment.execute(streamGraph)`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [flink] kl0u opened a new pull request #13556: Flink 19485 final

Reply via email to