[ 
https://issues.apache.org/jira/browse/FLINK-25683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17478355#comment-17478355
 ] 

Yao Zhang commented on FLINK-25683:
-----------------------------------

Hi [~twalthr] ,

This issue and FLINK-25471 is quite similar.  The root cause is 
BatchExecutionKeyedStateBackend calls 

notifyKeySelected only if it receives the data with different key. As a result, 
last elements(right before Task Manager exits) stored in 
BatchExecutionKeyedStateBackend the will never have a chance to be collected by 
the downstream. I think it is the root cause.

I plan to change AbstractStreamOperator by extending BoundedMultiInput and set 
the key to a designed value when the input reaches END_OF_DATA. By doing this, 
in batch mode it will trigger notifyKeySelected and finally all elements will 
be collected. It might not have any side effect in streaming mode. Both 
FLINK-25471 and this ticket can be solved by this change. Correct me if I am 
wrong.

Could you please assign this ticket to me?

> wrong result if table transfrom to DataStream then window process in batch 
> mode
> -------------------------------------------------------------------------------
>
>                 Key: FLINK-25683
>                 URL: https://issues.apache.org/jira/browse/FLINK-25683
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / API, Table SQL / Runtime
>    Affects Versions: 1.14.2
>         Environment: mac book pro m1 
> jdk 8 
> scala 2.11
> flink 1.14.2
> idea 2020
>            Reporter: zhangzh
>            Priority: Major
>         Attachments: TableToDataStreamBatchWindowTest.scala, pom.xml
>
>
> I have 5 line datas,
> i first need to transform current data with SQL
> then mix current data and historical data which is batch get from hbase
> for some special reason the program must run in batch mode
> i think the correct result should be like this:
> (BOB,1)
> (EMA,1)
> (DOUG,1)
> (ALICE,1)
> (CENDI,1)
> but the result is :
> (EMA,1)
>  
> if i set different parallelism ,the result is different.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to