[ 
https://issues.apache.org/jira/browse/FLINK-8158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16268653#comment-16268653
 ] 

ASF GitHub Bot commented on FLINK-8158:
---------------------------------------

GitHub user hequn8128 opened a pull request:

    https://github.com/apache/flink/pull/5094

    [FLINK-8158] [table] Fix rowtime window inner join emits late data bug

    
    ## What is the purpose of the change
    
    This pull request fixes rowtime window inner join emits late data bug. When 
executing the join, the join operator needs to make sure that no late data is 
emitted. However, the window border is not handled correctly.
    
    
    ## Brief change log
    
      - Set `WatermarkDelay` to `MaxOutputDelay + 1` instead of `MaxOutputDelay`
      - Add tests in `JoinHarnessTest`
    
    
    ## Verifying this change
    
    This change added tests and can be verified as follows:
    
      - *Added tests in `JoinHarnessTest` to check if late data is outputted*
    
    ## Does this pull request potentially affect one of the following parts:
    
      - Dependencies (does it add or upgrade a dependency): (no)
      - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: (no)
      - The serializers: (no)
      - The runtime per-record code paths (performance sensitive): (no)
      - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
      - The S3 file system connector: (no)
    
    ## Documentation
    
      - Does this pull request introduce a new feature? (no)


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hequn8128/flink 8158

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/flink/pull/5094.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #5094
    
----
commit 37d08da40150d26b0c58daf1bc85b69675bf5d64
Author: 军长 <hequn....@alibaba-inc.com>
Date:   2017-11-28T10:58:52Z

    [FLINK-8158] [table] Fix Rowtime window inner join emits late data bug

----


> Rowtime window inner join emits late data
> -----------------------------------------
>
>                 Key: FLINK-8158
>                 URL: https://issues.apache.org/jira/browse/FLINK-8158
>             Project: Flink
>          Issue Type: Bug
>          Components: Table API & SQL
>            Reporter: Hequn Cheng
>            Assignee: Hequn Cheng
>         Attachments: screenshot-1xxx.png
>
>
> When executing the join, the join operator needs to make sure that no late 
> data is emitted. Currently, this achieved by holding back watermarks. 
> However, the window border is not handled correctly. For the sql bellow: 
> {quote}
>     val sqlQuery =
>       """
>         SELECT t2.key, t2.id, t1.id
>         FROM T1 as t1 join T2 as t2 ON
>           t1.key = t2.key AND
>           t1.rt BETWEEN t2.rt - INTERVAL '5' SECOND AND
>             t2.rt + INTERVAL '1' SECOND
>         """.stripMargin
>     val data1 = new mutable.MutableList[(String, String, Long)]
>     // for boundary test
>     data1.+=(("A", "LEFT1", 6000L))
>     val data2 = new mutable.MutableList[(String, String, Long)]
>     data2.+=(("A", "RIGHT1", 6000L))
> {quote}
> Join will output a watermark with timestamp 1000, but if left comes with 
> another data ("A", "LEFT1", 1000L), join will output a record with timestamp 
> 1000 which equals previous watermark.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to