big-doudou commented on PR #9182:
URL: https://github.com/apache/hudi/pull/9182#issuecomment-1649253338

   Flink sink hudi uses bucket index. If the amount of data between checkpoints 
is relatively large, part of the data will be flushed to hdfs first, and a file 
ID will be generated at this time. If the TM restarts abnormally before the 
checkpoint is completed, this code will judge flink job partial-failover and 
recovery, and Bootstrap() will not be executed. Therefore, the previously 
generated instant is reused, and the old log file will not be cleaned up, 
resulting in duplicate file ids in the same bucket.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to