[jira] [Updated] (FLINK-38974) Handle jobs during HA recovery

Yi Zhang (Jira) Thu, 26 Feb 2026 18:07:16 -0800


     [ 
https://issues.apache.org/jira/browse/FLINK-38974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Yi Zhang updated FLINK-38974:
-----------------------------
    Description: 
It is a part of the application high-availability (HA) improvements, focusing 
on job recovery behavior during JobManager failover for running applications.
 * For running jobs that are recovered after a JM failover, execution is 
deferred until an explicit application recovery signal is received. If no such 
signal arrives before the application reaches a terminal state (e.g., finished, 
failed, or cancelled), the job is marked as failed, basic metadata is preserved 
for visibility, and proper cleanup is performed.
 * For already-terminated jobs, the cleanup of job results (i.e., marking them 
as "clean") is delayed until after the application itself terminates. If the 
application later recovers and restarts, any "dirty" job results from the 
previous run are cleaned up during the recovery process.

> Handle jobs during HA recovery
> ------------------------------
>
>                 Key: FLINK-38974
>                 URL: https://issues.apache.org/jira/browse/FLINK-38974
>             Project: Flink
>          Issue Type: Sub-task
>            Reporter: Yi Zhang
>            Assignee: Yi Zhang
>            Priority: Major
>
> It is a part of the application high-availability (HA) improvements, focusing 
> on job recovery behavior during JobManager failover for running applications.
>  * For running jobs that are recovered after a JM failover, execution is 
> deferred until an explicit application recovery signal is received. If no 
> such signal arrives before the application reaches a terminal state (e.g., 
> finished, failed, or cancelled), the job is marked as failed, basic metadata 
> is preserved for visibility, and proper cleanup is performed.
>  * For already-terminated jobs, the cleanup of job results (i.e., marking 
> them as "clean") is delayed until after the application itself terminates. If 
> the application later recovers and restarts, any "dirty" job results from the 
> previous run are cleaned up during the recovery process.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (FLINK-38974) Handle jobs during HA recovery

Reply via email to