voon created HUDI-6087:
--------------------------
Summary: Hudi streaming-read is not stopping with savepoint
correctly
Key: HUDI-6087
URL: https://issues.apache.org/jira/browse/HUDI-6087
Project: Apache Hudi
Issue Type: Bug
Reporter: voon
Flink supports stopping with savepoint as documented here:
[https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/savepoints/#stopping-a-job-with-savepoint]
Stopping with savepoint will invoke these 3 interface functions Flink functions.
# cancel()
# snapshotState()
# close()
However, the current implementation of stopping with savepoint will cause an
*issuedInstant* to be *null* when snapshotState is invoked. This is so as
cancel() will set the *issuedInstant* to {*}null{*}, causing the
*snapshotState()* to add a *null* value to the *LinkState* list.
As such, when resuming from a savepoint, there will be:
# data loss if the LATEST (by not specifying a value for, i.e. default value)
*_read.start.commit_* is used
# duplicated data if the EARLIEST *_read.start.commit_* is used
--
This message was sent by Atlassian Jira
(v8.20.10#820010)