[
https://issues.apache.org/jira/browse/FLINK-25471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470277#comment-17470277
]
Yao Zhang commented on FLINK-25471:
-----------------------------------
Hi [~twalthr] ,
In Flink 1.13 it transforms sum to StreamGroupedReduceOperator even though it
is executed in batch mode, which is exactly a bug. Conversely in Flink 1.14 it
turns out to be BatchGroupedReduceOperator and it is correct. But there are
still some issues for BatchGroupedReduceOperator. It will never output the
latest accumulated value right before task manager shuts down because it only
register an EventTimeTimer when the next element with the different key
arrives. That is the reason why in batch mode some elements may be missing. I
would like to fix it.
> wrong result if table transfrom to DataStream then keyby sum in Batch Mode
> ---------------------------------------------------------------------------
>
> Key: FLINK-25471
> URL: https://issues.apache.org/jira/browse/FLINK-25471
> Project: Flink
> Issue Type: Bug
> Components: Table SQL / API, Table SQL / Runtime
> Affects Versions: 1.14.2
> Environment: mac book pro m1
> jdk 8
> scala 2.11
> flink 1.14.2
> idea 2020
> Reporter: zhangzh
> Priority: Critical
> Attachments: TableToDataStreamBatchWordCount-1.scala, pom.xml
>
>
> I have a dataStream with 6 lines datas like this:
> Row.of("Alice"),
> Row.of("alice"),
> Row.of("Bob"),
> Row.of("lily"),
> Row.of("lily"),
> Row.of("lily")
> then make it to table with one column "word"
> then sql transform : select upper(word) from tmp_table
> then change to dataStream
> then keyby sum.
>
> in batch mode:
> I think correct result is:
> > (BOB,1)
> > (ALICE,2)
> > (LILY,3)
>
> but the result is :
> > (BOB,1)
> if i set different parallelism ,the result is different.
>
> the source file and pom is in attach.
> is a bug?
> pelease help me!!!
>
>
>
>
>
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)