[
https://issues.apache.org/jira/browse/FLINK-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16268970#comment-16268970
]
ASF GitHub Bot commented on FLINK-8158:
---------------------------------------
Github user xccui commented on the issue:
https://github.com/apache/flink/pull/5094
Hi @hequn8128, thanks for looking into this.
I've checked the current implementation and found that it really may emit
late data. However, that was caused by the checkings below:
https://github.com/apache/flink/blob/427dfe42e2bea891b40e662bc97cdea57cdae3f5/flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/join/TimeBoundedStreamInnerJoin.scala#L173
and
https://github.com/apache/flink/blob/427dfe42e2bea891b40e662bc97cdea57cdae3f5/flink-libraries/flink-table/src/main/scala/org/apache/flink/table/runtime/join/TimeBoundedStreamInnerJoin.scala#L234
In some situations, they will not forbid the late rows from being
calculated and emitted. Honestly, I cannot think out a solution in a short
time. Do you want to continue working on that? Or I could take it over, if you
don't mind.
Thanks, Xingcan
> Rowtime window inner join emits late data
> -----------------------------------------
>
> Key: FLINK-8158
> URL: https://issues.apache.org/jira/browse/FLINK-8158
> Project: Flink
> Issue Type: Bug
> Components: Table API & SQL
> Reporter: Hequn Cheng
> Assignee: Hequn Cheng
> Attachments: screenshot-1xxx.png
>
>
> When executing the join, the join operator needs to make sure that no late
> data is emitted. Currently, this achieved by holding back watermarks.
> However, the window border is not handled correctly. For the sql bellow:
> {quote}
> val sqlQuery =
> """
> SELECT t2.key, t2.id, t1.id
> FROM T1 as t1 join T2 as t2 ON
> t1.key = t2.key AND
> t1.rt BETWEEN t2.rt - INTERVAL '5' SECOND AND
> t2.rt + INTERVAL '1' SECOND
> """.stripMargin
> val data1 = new mutable.MutableList[(String, String, Long)]
> // for boundary test
> data1.+=(("A", "LEFT1", 6000L))
> val data2 = new mutable.MutableList[(String, String, Long)]
> data2.+=(("A", "RIGHT1", 6000L))
> {quote}
> Join will output a watermark with timestamp 1000, but if left comes with
> another data ("A", "LEFT1", 1000L), join will output a record with timestamp
> 1000 which equals previous watermark.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)