voon created HUDI-6087:
--------------------------

             Summary: Hudi streaming-read is not stopping with savepoint 
correctly
                 Key: HUDI-6087
                 URL: https://issues.apache.org/jira/browse/HUDI-6087
             Project: Apache Hudi
          Issue Type: Bug
            Reporter: voon


Flink supports stopping with savepoint as documented here:

[https://nightlies.apache.org/flink/flink-docs-master/docs/ops/state/savepoints/#stopping-a-job-with-savepoint]

 

Stopping with savepoint will invoke these 3 interface functions Flink functions.
 # cancel()
 # snapshotState()
 # close()

 

However, the current implementation of stopping with savepoint will cause an 
*issuedInstant* to be *null* when snapshotState is invoked. This is so as 
cancel() will set the *issuedInstant* to {*}null{*}, causing the 
*snapshotState()* to add a *null* value to the *LinkState* list.

 

As such, when resuming from a savepoint, there will be:
 # data loss if the LATEST (by not specifying a value for, i.e. default value) 
*_read.start.commit_* is used
 # duplicated data if the EARLIEST *_read.start.commit_* is used

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to