walterddr opened a new pull request #7678: [FLINK-11453][DataStreamAPI] Support 
SliceStream with forwardable pane info using slice assigner, operator and stream
URL: https://github.com/apache/flink/pull/7678
 
 
   
   
   
   **(The sections below can be removed for hotfixes of typos)**
   -->
   
   ## What is the purpose of the change
   
   This PR introduces a new `SlicedStream` abstract operation, which creates a 
resulting stream of the intermediate results buffered in the internal state of 
`WindowedOperator`. 
   It creates a `Slice` data type as a result to contain all necessary 
information of a pane slice.
   With this API. further processing is possible for operations:
   ```
   val slicedStream: slicedStream = inputStream
     .keyBy("key")
     .sliceWindow(Time.seconds(5L)) 
     .aggregate(aggFunc)
   val resultStream = slicedStream
       .window(Time.seconds(5000L))
       .aggregate(aggFunc)
   ```
   Is possible to create much more efficient sliding window operation, where 
elements won't have to be duplicated into each window.
   
   ## Brief change log
   
     - Added `SliceAssigner` that assigns elements into zero or one window 
(a.k.a. the "slice")
     - Modified `KeyedStream` and `WindowedStream` API to incorporate the 
creation of `SlicedStream`
     - Added `SlicedStream` concept that can emit slicing results.
     - Created special operators `SliceOperator` and `IterableSliceOperator` to 
process the intermediate results.
     - Added in TumblingEvent/ProcessingTimeSliceAssigner as an example.
   
   
   ## Verifying this change
   
   - This change is already covered by multiple tests for backward 
compatibility 
   - This change added tests and can be verified as follows:
     - Added integration tests for end-to-end processing for reduce, 
aggregation, and general apply
     - Added translation tests in scala for verifying `SlicedStream` API 
conversion chained with `KeyedStream`.
     - Added integration tests specifically tested serialization and 
deserialization to/from state snapshot.
   
   ## Does this pull request potentially affect one of the following parts:
   
     - Dependencies (does it add or upgrade a dependency): no
     - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: no
     - The serializers: no
     - The runtime per-record code paths (performance sensitive): yes
     - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: no
     - The S3 file system connector: no
   
   ## Documentation
   
     - Does this pull request introduce a new feature? yes
     - If yes, how is the feature documented? not yet, await review
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to