[
https://issues.apache.org/jira/browse/HUDI-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shizhi Chen updated HUDI-4287:
------------------------------
Description:
CkpMetadata is introduced into flink module to reduce timeline burden, but
currently its
mechanism lacks corresponding status for rollback instants, which may result in
commit/delta commit instants deletion, and thus
StreamWriteOperatorCoordinator(meta end) and Write function(data end) will not
be coordinatited correctly.
Finally, data files will be deleted by mistake.
This situation will be easy to reproduced especially when
StreamWriteOperatorCoordinator schedules table services for a long time between
commit and init instants after the restoring from a checkpoint.
was:
CkpMetadata is introduced into flink module to reducing timeline burden, but
currently its
mechanism lacks corresponding status for rollback instants, which may result in
commit/delta commit instants deletion, and thus
StreamWriteOperatorCoordinator(meta end) and Write function(data end) will not
be coordinatited correctly.
Finally, data files will be deleted by mistake. This situation will be easy to
reproduced especially when StreamWriteOperatorCoordinator schedules table
services for a long time between commit and init instants after the restoring
from a checkpoint.
> Optimize Flink checkpoint meta mechanism to fix mistaken pending instants
> -------------------------------------------------------------------------
>
> Key: HUDI-4287
> URL: https://issues.apache.org/jira/browse/HUDI-4287
> Project: Apache Hudi
> Issue Type: Bug
> Components: flink
> Reporter: Shizhi Chen
> Assignee: Shizhi Chen
> Priority: Blocker
> Fix For: 0.12.0
>
>
> CkpMetadata is introduced into flink module to reduce timeline burden, but
> currently its
> mechanism lacks corresponding status for rollback instants, which may result
> in commit/delta commit instants deletion, and thus
> StreamWriteOperatorCoordinator(meta end) and Write function(data end) will
> not be coordinatited correctly.
> Finally, data files will be deleted by mistake.
> This situation will be easy to reproduced especially when
> StreamWriteOperatorCoordinator schedules table services for a long time
> between commit and init instants after the restoring from a checkpoint.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)