Hi, thanks for your replies and sorry for the delay. Most of my questions were answered, but I still have some concerns.
> If there are no further concerns by next Monday (June 22), I'll go ahead and start the [VOTE] thread for this FLIP. Isn't the actual FLIP still missing? I only saw Google Document. Do you mind creating a page according to [1]? ---------------------------------------- > 3. Checkpoint metadata layout > Regional Checkpoint recombines state from different checkpoint IDs. To track this, we add a refCheckpointId field to OperatorSubtaskState in the metadata, indicating which historical checkpoint a subtask’s state references. Could you explain how do we find the right OperatorSubtaskState - especially in case of rescaling? Does the proposal support rescaling? > 9. Finished operators > The concern is: a finished operator’s final commit notification gets skipped by Regional Checkpoint, and if this checkpoint is the last one, the operator never receives it — could this cause data loss? > In practice, the impact is limited: > ● Failed Region tasks are already gone: By the time the Regional Checkpoint completes, tasks in the failed Region have already been restarted (decline) or cancelled (timeout). There is no task left to receive the notification anyway. Checkpoint failure doesn't necessarily cause a restart (especially if this is limited to one region). The tasks should still be up and running. > ● maxConsecutiveFailures guarantees a global checkpoint: After reaching the limit, the next checkpoint is forced to be global, ensuring all tasks eventually receive notifyCheckpointComplete. We can’t skip the same Region forever. maxConsecutiveFailures might not be reached for the final checkpoint. > ● stop-with-savepoint bypasses Regional Checkpoint: When the user stops the job gracefully, it triggers a full global snapshot, not a Regional Checkpoint. So the final checkpoint is always complete. stop-with-savepoint should be fine, yes. To clarify, my concern is about jobs with bounded sources. In such cases, some subtasks might finish processing but still participate in checkpoints. After a successful checkpoint, they are guaranteed to get checkpoint completion notification - so that they can make side effects visible in external systems (commit Kafka transactions). See FLIP-147 [2] However, with the current proposal, the job might complete with some subtasks/regions failing the final checkpoint unless I'm missing something. This is essentially data loss. To prevent this, the final checkpoint must always be acked by all subtasks/regions. ---------------------------------------- There are quite some limitations in this proposal. Could you add a section describing how each of them is handled? 1. Reject job submission 2. Force all-region checkpoint 3. Warn in documentation > 1. Region independence — BLOCKING/HYBRID edges > You’re right. Our current scope is limited to embarrassingly parallel regions. In typical ETL scenarios, each parallelism maps to an independent Region with no edges connecting them. > 5. SharedStateRegistry — how are old states kept alive? > Good question. In the current design, since we only target embarrassingly parallel regions, there is typically no keyed state and no incremental state. As a result, the SharedStateRegistry is generally empty (setting aside File Merging and Changelog State for now, discussed on 8.), so keep-alive of files under the shared directory is not a concern. > 8. FLINK-26803 and FLIP-306 compatibility > This is a very important point. Both features essentially merge small files at the job level. As Rui Fan pointed out, if the merging granularity is reduced to the Region level, compatibility with Regional Checkpoint should be achievable in theory. I think this can be deferred to future work — once FLINK-26803 is consolidated into FLIP-306, we can revisit and enable support. > 10. NO_CLAIM mode warning > You’re absolutely right — this is an important reminder. After restoring from a Regional Checkpoint, only a successful global checkpoint guarantees independence from the old state. We’ll add a clear user warning in the documentation. > 11. Changelog state backend — not supported > As mentioned earlier, our primary target is embarrassingly parallel regions, which typically have no keyed state and therefore no slow incremental state flush issues. I don’t think we need to support Changelog state backend for now. ---------------------------------------- > 2. max-consecutive-failures exceeded — what exactly happens? > The current design says “force a global checkpoint.” To clarify the two-tier behavior: > ● Tier 1: When consecutiveRegionalCount >= maxConsecutiveFailures, the next checkpoint is forced to be global. > ● Tier 2: If that forced global checkpoint also fails (any task declines), the checkpoint is aborted normally (not a job failure). The counter is then reset since a global checkpoint was attempted, and the next checkpoint cycle can try again. > This avoids cascading into job failure while ensuring we don’t drift indefinitely on historical state. My assumption was that we would not allow this particular failed region to fail the checkpoint again. But forcing a global checkpoint works as well. > 6. Checkpoint abort notifications & Local Recovery cleanup — new notification type > This is a very insightful point. Zihao and Gen also raised this in earlier discussions. The current design doesn’t address state cleanup for tasks in failed regions. I agree it’s necessary to introduce a new notification type. For tasks in failed regions, local state cleanup can be deferred until the next checkpoint trigger. Ok, this can be some future work. > 7. Task that never acknowledges nor declines — per-region timeouts > This was discussed in the previous thread. Network issues may cause a task to neither ack nor decline in time. In such cases, we treat it as a checkpoint timeout: the affected tasks’ region is marked as failed, and the process ultimately falls through to the normalRegional Checkpoint processing logic. Ok, this can be some future work. ---------------------------------------- [1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65145551#FlinkImprovementProposals-CreateyourOwnFLIP [2] https://cwiki.apache.org/confluence/display/FLINK/FLIP-147%3A+Support+Checkpoints+After+Tasks+Finished Regards, Roman Regards, Roman On Wed, Jun 17, 2026 at 9:56 AM 熊饶饶 <[email protected]> wrote: > Hi all, > > Thanks everyone for the valuable feedback. I believe all the points raised > above have been addressed (@Roman @Rui Fan). If there are no further > concerns by next Monday (June 22), I'll go ahead and start the [VOTE] > thread for this FLIP. > > For reference, the earlier related discussion can be found here: > https://lists.apache.org/thread/qpztk0jdpcmhomszjx63l53xv26xnmwf > > > Please feel free to share any additional feedback before then. > > Best Regards, > Raorao > > 2026年5月27日 16:31,熊饶饶 <[email protected]> 写道: > > Hi devs, > > I would like to start a discussion on FLIP-XXX: Independent Checkpoint > Based On Pipeline Region. > > In high-parallelism streaming jobs, a single Task's checkpoint failure > causes the entire global Checkpoint to abort, leading to degraded > checkpoint success rates and wasted compute resources (especially for GPU > operators). > > We propose Regional Checkpoint: when some Regions fail to checkpoint, the > framework combines the historical state of the failed Regions with the > current state of the healthy Regions to produce a logically complete > Completed Checkpoint — while preserving state consistency. The key changes > are: > > 1. Snapshot Collection — Allow partial region failures; combine last > successful state of failed Regions with current state of normal Regions. > > 2. State Correction — New checkpointCoordinatorForRegionFallback interface > for OperatorCoordinators to produce consistent snapshots against the mixed > view. > > 3. Checkpoint Store — Track ref_checkpoint_id in metadata to prevent > premature cleanup of referenced historical checkpoints. > > The detailed design is described in the FLIP document: > > https://docs.google.com/document/d/153r9NjHN9xgFUBdZ8sNX6YjUWTREtDMv5i-JaMdE6NU/edit?usp=sharing > > Looking forward to your feedback! > > Best regards, > > Raorao Xiong > > >
