[ 
https://issues.apache.org/jira/browse/FLINK-37005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Piotr Nowojski closed FLINK-37005.
----------------------------------
    Fix Version/s: 2.1.0
     Release Note: 
Newly submitted deduplication "keep first row" SQL queries, both event time and 
processing time, will be compiled as append only. This can significantly 
improve performance the queries, for example thanks to no append-only output no 
longer requiring `SinkUpsertMaterializer`.

Row time keep first row deduplication will be now emitting results on 
watermarks, and never retracting/modifying it - instead of emitting some result 
ASAP and then potentially retracting if an older (even time) record arrives. 
       Resolution: Fixed

Merged to master as 7bb152b5af4..a08e6021423

> Make StreamExecDeduplicate ouput insert only where possible
> -----------------------------------------------------------
>
>                 Key: FLINK-37005
>                 URL: https://issues.apache.org/jira/browse/FLINK-37005
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table SQL / Planner, Table SQL / Runtime
>    Affects Versions: 2.0.0
>            Reporter: Piotr Nowojski
>            Assignee: Piotr Nowojski
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.1.0
>
>
> According to planner, {{StreamExecDeduplicate}} currently always outputs 
> updates/retractions, even when this is currently not the case in the runtime. 
> This can performance problems, for example forcing planner to add 
> {{SinkUpsertMaterializer}} operator down stream from the deduplication, while 
> it's actually not necessary. 
> In this ticket, I would like to both support outputing insert only and 
> increase number of cases where that's actually the case.
> # Proc time keep first row is currently already implemented in such a way 
> that it outputs inserts only, but this is not actually used/marked in the 
> planner (planner change only)
> # Row time keep first row, could be also implemented to output inserts only, 
> with an operator that emits deduplication result on watermark, instead of on 
> each record (planner + runtime change)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to