[ 
https://issues.apache.org/jira/browse/FLINK-8482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-8482:
----------------------------------
    Component/s: API / DataStream

> Implement and expose option to use min / max / left / right timestamp for 
> joined streamrecords
> ----------------------------------------------------------------------------------------------
>
>                 Key: FLINK-8482
>                 URL: https://issues.apache.org/jira/browse/FLINK-8482
>             Project: Flink
>          Issue Type: Sub-task
>          Components: API / DataStream
>            Reporter: Florian Schmidt
>            Assignee: Florian Schmidt
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.8.0
>
>
> The idea: Expose the option of which timestamp to use for the result of a 
> join. The idea that is currently the floating around includes the options
>  * _left_: Use timestamp of the element in a join that came from the left 
> stream
>  * _right_: Use timestamp of the element in a join that came from the right 
> stream
>  * _max_: Use the max timestamp of both elements in a join
>  * _min_: Use the max timestamp of both elements in a join
> All options but _max_ require to introduce delaying watermarks in the 
> operator, which is something that we were hesitant to do until now. This 
> should probably under go discussion once more in order to see if / how we 
> want to add this now. We could even think of exposing this in a more general 
> way by adding a base operator that allows delayed watermarks.
> This will also be groundwork for supporting outer joins (FLINK-8483) for 
> which in any case we watermark delays to provide correctness. 
> Also the API for this needs some feedback in order to expose this in a 
> powerful, yet clear way. In my PoC at [1] I used the naming convention left / 
> right to refer to specific streams with currently is not something the api 
> exposes to the user, we should probably use something more clever here.
> Example
> {code:java}
> keyedStreamOne.
>    .intervalJoin(keyedStreamTwo)
>    .between(Time.milliseconds(0), Time.milliseconds(2))
>    .assignMinTimestamp() // alternative .assignMaxTimestamp() 
> .assignLeftTimestamp() .assignRightTimestamp()
>    .process(new ProcessJoinFunction() { /* impl */ })
> {code}
>  
> Any feedback is highly appreciated!
> [1] 
> https://github.com/florianschmidt1994/flink/tree/flink-8482-add-option-for-different-timestamp-strategies-to-interval-join-operator
> cc [~StephanEwen] [~kkl0u]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to