[
https://issues.apache.org/jira/browse/FLINK-3109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ASF GitHub Bot updated FLINK-3109:
----------------------------------
Labels: easyfix patch pull-request-available (was: easyfix patch)
> Join two streams with two different buffer time
> -----------------------------------------------
>
> Key: FLINK-3109
> URL: https://issues.apache.org/jira/browse/FLINK-3109
> Project: Flink
> Issue Type: Improvement
> Components: API / DataStream
> Affects Versions: 0.10.1
> Reporter: Wang Yangjun
> Priority: Major
> Labels: easyfix, patch, pull-request-available
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> Current Flink streaming only supports join two streams on the same window.
> How to solve this problem?
> For example, there are two streams. One is advertisements showed to users.
> The tuple in which could be described as (id, showed timestamp). The other
> one is click stream -- (id, clicked timestamp). We want get a joined stream,
> which includes all the advertisement that is clicked by user in 20 minutes
> after showed.
> It is possible that after an advertisement is shown, some user click it
> immediately. It is possible that "click" message arrives server earlier than
> "show" message because of Internet delay. We assume that the maximum delay is
> one minute.
> Then the need is that we should alway keep a buffer(20 mins) of "show" stream
> and another buffer(1 min) of "click" stream.
> It would be grate that there is such an API like.
> showStream.join(clickStream)
> .where(keySelector)
> .buffer(Time.of(20, TimeUnit.MINUTES))
> .equalTo(keySelector)
> .buffer(Time.of(1, TimeUnit.MINUTES))
> .apply(JoinFunction)
> http://stackoverflow.com/questions/33849462/how-to-avoid-repeated-tuples-in-flink-slide-window-join/34024149#34024149
--
This message was sent by Atlassian Jira
(v8.3.4#803005)