[jira] [Comment Edited] (FLINK-28473) JobManager restart/failover doesn't trigger local recovery on TaskManagers

Levani Kokhreidze (Jira) Mon, 11 Jul 2022 02:06:06 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-28473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17564881#comment-17564881
 ]


Levani Kokhreidze edited comment on FLINK-28473 at 7/11/22 9:05 AM:
--------------------------------------------------------------------

Hi [~Ming Li] thanks for the comment.

Wondering as a future improvement (FLIP), when using HA mode this could be 
achieved by storing allocation metadata as leader info. So during the failover 
new leader could try to use the previous leader's information.

Of course it will be more complex than that, just thinking out loud because I 
guess this will be great improvement for folks who deploy JM on dynamic 
environment like Kubernetes.


was (Author: lkokhreidze):
Hi [~Ming Li] thanks for the comment.

Wondering as a future improvement (FLIP), when using HA mode this could be 
achieved by storing allocation metadata as leader info. So during the failover 
new leader could try to use the previous leader's information.

Of course it will be more complex than that, just thinking out loud.

> JobManager restart/failover doesn't trigger local recovery on TaskManagers
> --------------------------------------------------------------------------
>
>                 Key: FLINK-28473
>                 URL: https://issues.apache.org/jira/browse/FLINK-28473
>             Project: Flink
>          Issue Type: Bug
>          Components: Runtime / State Backends, Runtime / Task
>    Affects Versions: 1.15.0, 1.15.1
>            Reporter: Levani Kokhreidze
>            Priority: Major
>
> Hi! While experimenting with local recovery feature (Flink 1.15.1) I noticed 
> that if JobManager is restarted TaskManagers always recover from Remote 
> ({{{}IncrementalRemoteKeyedStateHandle{}}}). While if I restart task 
> managers, local recovery is triggered.
>  
> Setup: * HA setup with Zookeeper and S3 remote storage.
>  * JobManager runs as StatefulSet with PersistentVolume. Both 
> {{process.jobmanager.working-dir}} and {{jobmanager.resource-id}} are 
> correctly configured.
>  * TaskManagers run as StatefulSets with PersistentVolume. Both 
> {{process.taskmanager.working-dir}} and {{taskmanager.resource-id}} are 
> correctly configured.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (FLINK-28473) JobManager restart/failover doesn't trigger local recovery on TaskManagers

Reply via email to