[
https://issues.apache.org/jira/browse/FLINK-29652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618888#comment-17618888
]
Jingqi Shi commented on FLINK-29652:
------------------------------------
[~luoyuxia] The key point is MaterializedCollectResultBase.materializedTable.
In BATCH mode, ResultRetrievalThread will process record and put row record
into materializedTable, and MaterializedCollectResultBase.snapshot method will
transfer all rows from materializedTable to snapshot array list.
Write and read of materializedTable array list is not record by record in BATCH
mode, so result from materializedTable/snapshot array list is not exactly-once
and may be duplicate or missing.
For example, in our case, if ResultRetrievalThread thread has processed all
record and isRunning status is not assigned false,
MaterializedCollectResultBase.snapshot method still can transfer query result
from materializedTable to snapshot, so we get duplicate rows.
> get duplicate result from sql-client in BATCH mode
> --------------------------------------------------
>
> Key: FLINK-29652
> URL: https://issues.apache.org/jira/browse/FLINK-29652
> Project: Flink
> Issue Type: Bug
> Components: Table SQL / Client
> Affects Versions: 1.13.0, 1.14.0, 1.15.0, 1.16.0
> Reporter: Jingqi Shi
> Priority: Major
>
> In BATCH mode, we experienced problems with flink-sql-client when retrieving
> result record. We may get duplicate row records occasionally even if querying
> from a hive/hudi table which contains only one record.
>
> For example, SELECT COUNT(1) AS val FROM x.test_hive_table, we may get:
> {code:java}
> +------+
> | val |
> +------+
> | 1 |
> | … |
> | 1 |
> +------+ {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)