[ 
https://issues.apache.org/jira/browse/FLINK-29652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618888#comment-17618888
 ] 

Jingqi Shi commented on FLINK-29652:
------------------------------------

[~luoyuxia] The key point is MaterializedCollectResultBase.materializedTable. 
In BATCH mode, ResultRetrievalThread will process record and put row record 
into materializedTable, and MaterializedCollectResultBase.snapshot method will 
transfer all rows from materializedTable to snapshot array list.

 

Write and read of materializedTable array list is not record by record in BATCH 
mode, so result from materializedTable/snapshot array list is not exactly-once 
and may be duplicate or missing. 

 

For example, in our case, if ResultRetrievalThread thread has processed all 
record and isRunning status is not assigned false, 
MaterializedCollectResultBase.snapshot method still can transfer query result 
from materializedTable to snapshot, so we get duplicate rows.

> get duplicate result from sql-client in BATCH mode
> --------------------------------------------------
>
>                 Key: FLINK-29652
>                 URL: https://issues.apache.org/jira/browse/FLINK-29652
>             Project: Flink
>          Issue Type: Bug
>          Components: Table SQL / Client
>    Affects Versions: 1.13.0, 1.14.0, 1.15.0, 1.16.0
>            Reporter: Jingqi Shi
>            Priority: Major
>
> In BATCH mode, we experienced problems with flink-sql-client when retrieving 
> result record. We may get duplicate row records occasionally even if querying 
> from a hive/hudi table which contains only one record.
>  
> For example, SELECT COUNT(1) AS val FROM x.test_hive_table, we may get:
> {code:java}
> +------+
> | val  |
> +------+
> | 1    |
> | …    |
> | 1    |
> +------+ {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to