[jira] [Commented] (FLINK-25471) wrong result if table transfrom to DataStream then keyby sum in Batch Mode

Yao Zhang (Jira) Thu, 06 Jan 2022 17:09:05 -0800


    [ 
https://issues.apache.org/jira/browse/FLINK-25471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470277#comment-17470277
 ]


Yao Zhang commented on FLINK-25471:
-----------------------------------

Hi [~twalthr] ,

In Flink 1.13 it transforms sum to StreamGroupedReduceOperator even though it 
is executed in batch mode, which is exactly a bug. Conversely in Flink 1.14 it 
turns out to be BatchGroupedReduceOperator and it is correct. But there are 
still some issues for BatchGroupedReduceOperator. It will never output the 
latest accumulated value right before task manager shuts down because it only 
register an EventTimeTimer when the next element with the different key 
arrives. That is the reason why in batch mode some elements may be missing. I 
would like to fix it.

> wrong result if table transfrom to DataStream then keyby  sum in Batch Mode
> ---------------------------------------------------------------------------
>
>                 Key: FLINK-25471
>                 URL: https://issues.apache.org/jira/browse/FLINK-25471
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / API, Table SQL / Runtime
>    Affects Versions: 1.14.2
>         Environment: mac book pro m1 
> jdk 8 
> scala 2.11
> flink 1.14.2
> idea 2020
>            Reporter: zhangzh
>            Priority: Critical
>         Attachments: TableToDataStreamBatchWordCount-1.scala, pom.xml
>
>
> I have a dataStream with 6 lines datas like this:
> Row.of("Alice"),
> Row.of("alice"),
> Row.of("Bob"),
> Row.of("lily"),
> Row.of("lily"),
> Row.of("lily")
> then  make it to  table  with one column "word"
> then sql transform : select upper(word) from tmp_table
> then change to dataStream
> then keyby sum.
>  
> in batch mode:
> I think correct result is:
> > (BOB,1)
> > (ALICE,2)
> > (LILY,3)
>  
> but the result is :
> > (BOB,1)
> if i set different parallelism ,the result is different.
>  
> the source file  and pom is in attach.
>  is  a bug?
> pelease help me!!!
>  
>  
>  
>  
>  
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (FLINK-25471) wrong result if table transfrom to DataStream then keyby sum in Batch Mode

Reply via email to