Wang Yangjun created FLINK-3109:
-----------------------------------
Summary: Join two streams with two different cache time
Key: FLINK-3109
URL: https://issues.apache.org/jira/browse/FLINK-3109
Project: Flink
Issue Type: Improvement
Components: Streaming
Affects Versions: 0.10.1
Reporter: Wang Yangjun
Fix For: 0.10.2
Current Flink streaming only supports join two streams on the same window. How
to solve this problem?
For example, there are two streams. One is advertisements showed to users. The
tuple in which could be described as (id, showed timestamp). The other one is
click stream -- (id, clicked timestamp). We want get a joined stream, which
includes all the advertisement that is clicked by user in 20 minutes after
showed.
It is possible that after an advertisement is shown, some user click it
immediately. The "click" message could arrive to server earlier than "show"
message because of Internet delay. We assume that the maximum delay is one
minute.
Then the need is that we should alway keep a buffer(20 mins) of "show" stream
and another buffer(1 min) of "click" stream.
It would be grate that there is such an API like.
showStream.join(clickStream)
.where(keySelector)
.buffer(Time.of(20, TimeUnit.MINUTES))
.equalTo(keySelector)
.buffer(Time.of(1, TimeUnit.MINUTES))
.apply(JoinFunction)
http://stackoverflow.com/questions/33849462/how-to-avoid-repeated-tuples-in-flink-slide-window-join/34024149#34024149
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)