Elias Levy created FLINK-6243:
---------------------------------
Summary: Continuous Joins: True Sliding Window Joins
Key: FLINK-6243
URL: https://issues.apache.org/jira/browse/FLINK-6243
Project: Flink
Issue Type: New Feature
Components: Streaming
Affects Versions: 1.1.4
Reporter: Elias Levy
Flink defines sliding window joins as the join of elements of two streams that
share a window of time, where the windows are defined by advancing them forward
some amount of time that is less than the window time span. More generally,
such windows are just overlapping hopping windows.
Other systems, such as Kafka Streams, support a different notion of sliding
window joins. In these systems, two elements of a stream are joined if the
absolute time difference between the them is less or equal the time window
length.
This alternate notion of sliding window joins has some advantages in some
applications over the current implementation.
Elements to be joined may both fall within multiple overlapping sliding
windows, leading them to be joined multiple times, when we only wish them to be
joined once.
The implementation need not instantiate window objects to keep track of stream
elements, which becomes problematic in the current implementation if the window
size is very large and the slide is very small.
It allows for asymmetric time joins. E.g. join if elements from stream A are
no more than X time behind and Y time head of an element from stream B.
It is currently possible to implement a join with these semantics using
{{CoProcessFunction}}, but the capability should be a first class feature, such
as it is in Kafka Streams.
To perform the join, elements of each stream must be buffered for at least the
window time length. To allow for large window sizes and high volume of
elements, the state, possibly optionally, should be buffered such as it can
spill to disk (e.g. by using RocksDB).
The same stream may be joined multiple times in a complex topology. As an
optimization, it may be wise to reuse any element buffer among colocated join
operators. Otherwise, there may write amplification and increased state that
must be snapshotted.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)