[ 
https://issues.apache.org/jira/browse/FLINK-20798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hayden zhou updated FLINK-20798:
--------------------------------
    Description: 

When deploying standalone Flink on Kubernetes and configure the 
{{high-availability.storageDir}} to a mounted PVC directory, the Flink webui 
could not be visited normally. It shows that "Service temporarily unavailable 
due to an ongoing leader election. Please refresh".

 


The following is related logs from JobManager.

{code}
2020-12-29T06:45:54.177850394Z 2020-12-29 14:45:54,177 DEBUG 
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Leader 
election started
 2020-12-29T06:45:54.177855303Z 2020-12-29 14:45:54,177 DEBUG 
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - 
Attempting to acquire leader lease 'ConfigMapLock: default - 
mta-flink-resourcemanager-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'...
 2020-12-29T06:45:54.178668055Z 2020-12-29 14:45:54,178 DEBUG 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket 
successfully opened
 2020-12-29T06:45:54.178895963Z 2020-12-29 14:45:54,178 INFO 
org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - 
Starting DefaultLeaderRetrievalService with 
KubernetesLeaderRetrievalDriver\{configMapName='mta-flink-resourcemanager-leader'}.
 2020-12-29T06:45:54.179327491Z 2020-12-29 14:45:54,179 DEBUG 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - 
Connecting websocket ... 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@6d303498
 2020-12-29T06:45:54.230081993Z 2020-12-29 14:45:54,229 DEBUG 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket 
successfully opened
 2020-12-29T06:45:54.230202329Z 2020-12-29 14:45:54,230 INFO 
org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - 
Starting DefaultLeaderRetrievalService with 
KubernetesLeaderRetrievalDriver\{configMapName='mta-flink-dispatcher-leader'}.
 2020-12-29T06:45:54.230219281Z 2020-12-29 14:45:54,229 DEBUG 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket 
successfully opened
 2020-12-29T06:45:54.230353912Z 2020-12-29 14:45:54,230 INFO 
org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - 
Starting DefaultLeaderElectionService with 
KubernetesLeaderElectionDriver\{configMapName='mta-flink-resourcemanager-leader'}.
 2020-12-29T06:45:54.237004177Z 2020-12-29 14:45:54,236 DEBUG 
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Leader 
changed from null to 6f6479c6-86cc-4d62-84f9-37ff968bd0e5
 2020-12-29T06:45:54.237024655Z 2020-12-29 14:45:54,236 INFO 
org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector [] - 
New leader elected 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 for 
mta-flink-restserver-leader.
 2020-12-29T06:45:54.237027811Z 2020-12-29 14:45:54,236 DEBUG 
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - 
Successfully Acquired leader lease 'ConfigMapLock: default - 
mta-flink-restserver-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'
 2020-12-29T06:45:54.237297376Z 2020-12-29 14:45:54,237 DEBUG 
org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Grant 
leadership to contender 
[http://mta-flink-jobmanager:8081|http://mta-flink-jobmanager:8081/] with 
session ID 9587e13f-322f-4cd5-9fff-b4941462be0f.
 2020-12-29T06:45:54.237353551Z 2020-12-29 14:45:54,237 INFO 
org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint [] - 
[http://mta-flink-jobmanager:8081|http://mta-flink-jobmanager:8081/] was 
granted leadership with leaderSessionID=9587e13f-322f-4cd5-9fff-b4941462be0f
 2020-12-29T06:45:54.237440354Z 2020-12-29 14:45:54,237 DEBUG 
org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - 
Confirm leader session ID 9587e13f-322f-4cd5-9fff-b4941462be0f for leader 
[http://mta-flink-jobmanager:8081|http://mta-flink-jobmanager:8081/].
 2020-12-29T06:45:54.254555127Z 2020-12-29 14:45:54,254 DEBUG 
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Leader 
changed from null to 6f6479c6-86cc-4d62-84f9-37ff968bd0e5
 2020-12-29T06:45:54.254588299Z 2020-12-29 14:45:54,254 INFO 
org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector [] - 
New leader elected 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 for 
mta-flink-resourcemanager-leader.
 2020-12-29T06:45:54.254628053Z 2020-12-29 14:45:54,254 DEBUG 
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - 
Successfully Acquired leader lease 'ConfigMapLock: default - 
mta-flink-resourcemanager-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'
 2020-12-29T06:45:54.254871569Z 2020-12-29 14:45:54,254 DEBUG 
org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Grant 
leadership to contender LeaderContender: StandaloneResourceManager with session 
ID b1730dc6-0f94-49f4-b519-56917f3027b7.
 2020-12-29T06:45:54.256608291Z 2020-12-29 14:45:54,256 DEBUG 
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - 
Attempting to renew leader lease 'ConfigMapLock: default - 
mta-flink-resourcemanager-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'...
 2020-12-29T06:45:54.259155793Z 2020-12-29 14:45:54,258 DEBUG 
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Leader 
changed from null to 6f6479c6-86cc-4d62-84f9-37ff968bd0e5
 2020-12-29T06:45:54.259176091Z 2020-12-29 14:45:54,258 INFO 
org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector [] - 
New leader elected 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 for 
mta-flink-dispatcher-leader.
 2020-12-29T06:45:54.25918096Z 2020-12-29 14:45:54,259 DEBUG 
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - 
Successfully Acquired leader lease 'ConfigMapLock: default - 
mta-flink-dispatcher-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'
 2020-12-29T06:45:54.259362149Z 2020-12-29 14:45:54,259 DEBUG 
org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Grant 
leadership to contender LeaderContender: DefaultDispatcherRunner with session 
ID fbbaa883-69f6-43df-9ca0-c646bc1baad1.
 2020-12-29T06:45:54.260301799Z 2020-12-29 14:45:54,260 DEBUG 
org.apache.flink.runtime.dispatcher.runner.DefaultDispatcherRunner [] - Create 
new DispatcherLeaderProcess with leader session id 
fbbaa883-69f6-43df-9ca0-c646bc1baad1.
 2020-12-29T06:45:54.266724597Z 2020-12-29 14:45:54,266 INFO 
org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - 
Start SessionDispatcherLeaderProcess.
 2020-12-29T06:45:54.267718418Z 2020-12-29 14:45:54,267 DEBUG 
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - 
Attempting to renew leader lease 'ConfigMapLock: default - 
mta-flink-dispatcher-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'...
 2020-12-29T06:45:54.26786349Z 2020-12-29 14:45:54,267 INFO 
org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - 
Recover all persisted job graphs.
 2020-12-29T06:45:54.267976912Z 2020-12-29 14:45:54,267 DEBUG 
org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieving all 
stored job ids from 
KubernetesStateHandleStore\{configMapName='mta-flink-dispatcher-leader'}.
 2020-12-29T06:45:54.277681598Z 2020-12-29 14:45:54,277 INFO 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
ResourceManager 
akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/resourcemanager_0 was 
granted leadership with fencing token b51956917f3027b7b1730dc60f9449f4
 2020-12-29T06:45:54.280411279Z 2020-12-29 14:45:54,280 INFO 
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - 
Starting the SlotManager.
 2020-12-29T06:45:54.281367931Z 2020-12-29 14:45:54,281 DEBUG 
org.apache.flink.kubernetes.highavailability.KubernetesLeaderElectionDriver [] 
- Successfully wrote leader information: 
Leader=[http://mta-flink-jobmanager:8081|http://mta-flink-jobmanager:8081/], 
session ID=9587e13f-322f-4cd5-9fff-b4941462be0f.
 2020-12-29T06:45:54.281528772Z 2020-12-29 14:45:54,281 DEBUG 
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - 
Attempting to renew leader lease 'ConfigMapLock: default - 
mta-flink-restserver-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'...
 2020-12-29T06:45:54.286191344Z 2020-12-29 14:45:54,286 DEBUG 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger 
heartbeat request.
 2020-12-29T06:45:54.286304807Z 2020-12-29 14:45:54,286 DEBUG 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger 
heartbeat request.
 2020-12-29T06:45:54.286438227Z 2020-12-29 14:45:54,286 DEBUG 
org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - 
Confirm leader session ID b1730dc6-0f94-49f4-b519-56917f3027b7 for leader 
akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/resourcemanager_0.
 2020-12-29T06:45:54.309361096Z 2020-12-29 14:45:54,309 DEBUG 
org.apache.flink.kubernetes.highavailability.KubernetesLeaderElectionDriver [] 
- Successfully wrote leader information: 
Leader=akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/resourcemanager_0, 
session ID=b1730dc6-0f94-49f4-b519-56917f3027b7.
 2020-12-29T06:45:54.320673232Z 2020-12-29 14:45:54,320 INFO 
org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieved job ids 
[] from KubernetesStateHandleStore\{configMapName='mta-flink-dispatcher-leader'}
 2020-12-29T06:45:54.3206989Z 2020-12-29 14:45:54,320 INFO 
org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - 
Successfully recovered 0 persisted job graphs.
 2020-12-29T06:45:54.324829616Z 2020-12-29 14:45:54,324 DEBUG 
org.apache.flink.runtime.rpc.akka.SupervisorActor [] - Starting 
FencedAkkaRpcActor with name dispatcher_1.
 2020-12-29T06:45:54.325343659Z 2020-12-29 14:45:54,325 INFO 
org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting RPC endpoint for 
org.apache.flink.runtime.dispatcher.StandaloneDispatcher at 
akka://flink/user/rpc/dispatcher_1 .
 2020-12-29T06:45:54.33778039Z 2020-12-29 14:45:54,337 DEBUG 
org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - 
Confirm leader session ID fbbaa883-69f6-43df-9ca0-c646bc1baad1 for leader 
akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/dispatcher_1.
 2020-12-29T06:45:54.36249763Z 2020-12-29 14:45:54,362 DEBUG 
org.apache.flink.kubernetes.highavailability.KubernetesLeaderElectionDriver [] 
- Successfully wrote leader information: 
Leader=akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/dispatcher_1, 
session ID=fbbaa883-69f6-43df-9ca0-c646bc1baad1.
 2020-12-29T06:46:04.298366262Z 2020-12-29 14:46:04,297 DEBUG 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger 
heartbeat request.
 2020-12-29T06:46:04.298442695Z 2020-12-29 14:46:04,298 DEBUG 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger 
heartbeat request.
 2020-12-29T06:46:14.318174464Z 2020-12-29 14:46:14,317 DEBUG 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger 
heartbeat request.
 2020-12-29T06:46:14.318256849Z 2020-12-29 14:46:14,318 DEBUG 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger 
heartbeat request.
 2020-12-29T06:46:24.337694477Z 2020-12-29 14:46:24,337 DEBUG 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger 
heartbeat request.
 2020-12-29T06:46:24.337816516Z 2020-12-29 14:46:24,337 DEBUG 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger 
heartbeat request.
 2020-12-29T06:46:26.044624193Z 2020-12-29 14:46:26,044 DEBUG 
org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBuf [] - 
-Dorg.apache.flink.shaded.netty4.io.netty.buffer.checkAccessible: true
{code}

  was:
我这边 部署 flink 到 k8s 使用 PVC 作为 high avalibility storagedir , 我看jobmanager 
的日志,选举成功了。但是 web 一直显示选举进行中。

When deploying standalone Flink on Kubernetes and configure the 
{{high-availability.storageDir}} to a mounted PVC directory, the Flink webui 
could not be visited normally. It shows that "Service temporarily unavailable 
due to an ongoing leader election. Please refresh".

 

下面是 jobmanager 的日志

The following is related logs from JobManager.

{code}
2020-12-29T06:45:54.177850394Z 2020-12-29 14:45:54,177 DEBUG 
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Leader 
election started
 2020-12-29T06:45:54.177855303Z 2020-12-29 14:45:54,177 DEBUG 
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - 
Attempting to acquire leader lease 'ConfigMapLock: default - 
mta-flink-resourcemanager-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'...
 2020-12-29T06:45:54.178668055Z 2020-12-29 14:45:54,178 DEBUG 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket 
successfully opened
 2020-12-29T06:45:54.178895963Z 2020-12-29 14:45:54,178 INFO 
org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - 
Starting DefaultLeaderRetrievalService with 
KubernetesLeaderRetrievalDriver\{configMapName='mta-flink-resourcemanager-leader'}.
 2020-12-29T06:45:54.179327491Z 2020-12-29 14:45:54,179 DEBUG 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - 
Connecting websocket ... 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@6d303498
 2020-12-29T06:45:54.230081993Z 2020-12-29 14:45:54,229 DEBUG 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket 
successfully opened
 2020-12-29T06:45:54.230202329Z 2020-12-29 14:45:54,230 INFO 
org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - 
Starting DefaultLeaderRetrievalService with 
KubernetesLeaderRetrievalDriver\{configMapName='mta-flink-dispatcher-leader'}.
 2020-12-29T06:45:54.230219281Z 2020-12-29 14:45:54,229 DEBUG 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - WebSocket 
successfully opened
 2020-12-29T06:45:54.230353912Z 2020-12-29 14:45:54,230 INFO 
org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - 
Starting DefaultLeaderElectionService with 
KubernetesLeaderElectionDriver\{configMapName='mta-flink-resourcemanager-leader'}.
 2020-12-29T06:45:54.237004177Z 2020-12-29 14:45:54,236 DEBUG 
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Leader 
changed from null to 6f6479c6-86cc-4d62-84f9-37ff968bd0e5
 2020-12-29T06:45:54.237024655Z 2020-12-29 14:45:54,236 INFO 
org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector [] - 
New leader elected 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 for 
mta-flink-restserver-leader.
 2020-12-29T06:45:54.237027811Z 2020-12-29 14:45:54,236 DEBUG 
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - 
Successfully Acquired leader lease 'ConfigMapLock: default - 
mta-flink-restserver-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'
 2020-12-29T06:45:54.237297376Z 2020-12-29 14:45:54,237 DEBUG 
org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Grant 
leadership to contender 
[http://mta-flink-jobmanager:8081|http://mta-flink-jobmanager:8081/] with 
session ID 9587e13f-322f-4cd5-9fff-b4941462be0f.
 2020-12-29T06:45:54.237353551Z 2020-12-29 14:45:54,237 INFO 
org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint [] - 
[http://mta-flink-jobmanager:8081|http://mta-flink-jobmanager:8081/] was 
granted leadership with leaderSessionID=9587e13f-322f-4cd5-9fff-b4941462be0f
 2020-12-29T06:45:54.237440354Z 2020-12-29 14:45:54,237 DEBUG 
org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - 
Confirm leader session ID 9587e13f-322f-4cd5-9fff-b4941462be0f for leader 
[http://mta-flink-jobmanager:8081|http://mta-flink-jobmanager:8081/].
 2020-12-29T06:45:54.254555127Z 2020-12-29 14:45:54,254 DEBUG 
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Leader 
changed from null to 6f6479c6-86cc-4d62-84f9-37ff968bd0e5
 2020-12-29T06:45:54.254588299Z 2020-12-29 14:45:54,254 INFO 
org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector [] - 
New leader elected 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 for 
mta-flink-resourcemanager-leader.
 2020-12-29T06:45:54.254628053Z 2020-12-29 14:45:54,254 DEBUG 
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - 
Successfully Acquired leader lease 'ConfigMapLock: default - 
mta-flink-resourcemanager-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'
 2020-12-29T06:45:54.254871569Z 2020-12-29 14:45:54,254 DEBUG 
org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Grant 
leadership to contender LeaderContender: StandaloneResourceManager with session 
ID b1730dc6-0f94-49f4-b519-56917f3027b7.
 2020-12-29T06:45:54.256608291Z 2020-12-29 14:45:54,256 DEBUG 
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - 
Attempting to renew leader lease 'ConfigMapLock: default - 
mta-flink-resourcemanager-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'...
 2020-12-29T06:45:54.259155793Z 2020-12-29 14:45:54,258 DEBUG 
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - Leader 
changed from null to 6f6479c6-86cc-4d62-84f9-37ff968bd0e5
 2020-12-29T06:45:54.259176091Z 2020-12-29 14:45:54,258 INFO 
org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector [] - 
New leader elected 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 for 
mta-flink-dispatcher-leader.
 2020-12-29T06:45:54.25918096Z 2020-12-29 14:45:54,259 DEBUG 
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - 
Successfully Acquired leader lease 'ConfigMapLock: default - 
mta-flink-dispatcher-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'
 2020-12-29T06:45:54.259362149Z 2020-12-29 14:45:54,259 DEBUG 
org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - Grant 
leadership to contender LeaderContender: DefaultDispatcherRunner with session 
ID fbbaa883-69f6-43df-9ca0-c646bc1baad1.
 2020-12-29T06:45:54.260301799Z 2020-12-29 14:45:54,260 DEBUG 
org.apache.flink.runtime.dispatcher.runner.DefaultDispatcherRunner [] - Create 
new DispatcherLeaderProcess with leader session id 
fbbaa883-69f6-43df-9ca0-c646bc1baad1.
 2020-12-29T06:45:54.266724597Z 2020-12-29 14:45:54,266 INFO 
org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - 
Start SessionDispatcherLeaderProcess.
 2020-12-29T06:45:54.267718418Z 2020-12-29 14:45:54,267 DEBUG 
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - 
Attempting to renew leader lease 'ConfigMapLock: default - 
mta-flink-dispatcher-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'...
 2020-12-29T06:45:54.26786349Z 2020-12-29 14:45:54,267 INFO 
org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - 
Recover all persisted job graphs.
 2020-12-29T06:45:54.267976912Z 2020-12-29 14:45:54,267 DEBUG 
org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieving all 
stored job ids from 
KubernetesStateHandleStore\{configMapName='mta-flink-dispatcher-leader'}.
 2020-12-29T06:45:54.277681598Z 2020-12-29 14:45:54,277 INFO 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
ResourceManager 
akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/resourcemanager_0 was 
granted leadership with fencing token b51956917f3027b7b1730dc60f9449f4
 2020-12-29T06:45:54.280411279Z 2020-12-29 14:45:54,280 INFO 
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - 
Starting the SlotManager.
 2020-12-29T06:45:54.281367931Z 2020-12-29 14:45:54,281 DEBUG 
org.apache.flink.kubernetes.highavailability.KubernetesLeaderElectionDriver [] 
- Successfully wrote leader information: 
Leader=[http://mta-flink-jobmanager:8081|http://mta-flink-jobmanager:8081/], 
session ID=9587e13f-322f-4cd5-9fff-b4941462be0f.
 2020-12-29T06:45:54.281528772Z 2020-12-29 14:45:54,281 DEBUG 
io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - 
Attempting to renew leader lease 'ConfigMapLock: default - 
mta-flink-restserver-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'...
 2020-12-29T06:45:54.286191344Z 2020-12-29 14:45:54,286 DEBUG 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger 
heartbeat request.
 2020-12-29T06:45:54.286304807Z 2020-12-29 14:45:54,286 DEBUG 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger 
heartbeat request.
 2020-12-29T06:45:54.286438227Z 2020-12-29 14:45:54,286 DEBUG 
org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - 
Confirm leader session ID b1730dc6-0f94-49f4-b519-56917f3027b7 for leader 
akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/resourcemanager_0.
 2020-12-29T06:45:54.309361096Z 2020-12-29 14:45:54,309 DEBUG 
org.apache.flink.kubernetes.highavailability.KubernetesLeaderElectionDriver [] 
- Successfully wrote leader information: 
Leader=akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/resourcemanager_0, 
session ID=b1730dc6-0f94-49f4-b519-56917f3027b7.
 2020-12-29T06:45:54.320673232Z 2020-12-29 14:45:54,320 INFO 
org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieved job ids 
[] from KubernetesStateHandleStore\{configMapName='mta-flink-dispatcher-leader'}
 2020-12-29T06:45:54.3206989Z 2020-12-29 14:45:54,320 INFO 
org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] - 
Successfully recovered 0 persisted job graphs.
 2020-12-29T06:45:54.324829616Z 2020-12-29 14:45:54,324 DEBUG 
org.apache.flink.runtime.rpc.akka.SupervisorActor [] - Starting 
FencedAkkaRpcActor with name dispatcher_1.
 2020-12-29T06:45:54.325343659Z 2020-12-29 14:45:54,325 INFO 
org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting RPC endpoint for 
org.apache.flink.runtime.dispatcher.StandaloneDispatcher at 
akka://flink/user/rpc/dispatcher_1 .
 2020-12-29T06:45:54.33778039Z 2020-12-29 14:45:54,337 DEBUG 
org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - 
Confirm leader session ID fbbaa883-69f6-43df-9ca0-c646bc1baad1 for leader 
akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/dispatcher_1.
 2020-12-29T06:45:54.36249763Z 2020-12-29 14:45:54,362 DEBUG 
org.apache.flink.kubernetes.highavailability.KubernetesLeaderElectionDriver [] 
- Successfully wrote leader information: 
Leader=akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/dispatcher_1, 
session ID=fbbaa883-69f6-43df-9ca0-c646bc1baad1.
 2020-12-29T06:46:04.298366262Z 2020-12-29 14:46:04,297 DEBUG 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger 
heartbeat request.
 2020-12-29T06:46:04.298442695Z 2020-12-29 14:46:04,298 DEBUG 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger 
heartbeat request.
 2020-12-29T06:46:14.318174464Z 2020-12-29 14:46:14,317 DEBUG 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger 
heartbeat request.
 2020-12-29T06:46:14.318256849Z 2020-12-29 14:46:14,318 DEBUG 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger 
heartbeat request.
 2020-12-29T06:46:24.337694477Z 2020-12-29 14:46:24,337 DEBUG 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger 
heartbeat request.
 2020-12-29T06:46:24.337816516Z 2020-12-29 14:46:24,337 DEBUG 
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - Trigger 
heartbeat request.
 2020-12-29T06:46:26.044624193Z 2020-12-29 14:46:26,044 DEBUG 
org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBuf [] - 
-Dorg.apache.flink.shaded.netty4.io.netty.buffer.checkAccessible: true
{code}


> Using PVC as high-availability.storageDir could not work
> --------------------------------------------------------
>
>                 Key: FLINK-20798
>                 URL: https://issues.apache.org/jira/browse/FLINK-20798
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / Kubernetes
>    Affects Versions: 1.12.0
>         Environment: FLINK 1.12.0
>            Reporter: hayden zhou
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.13.0, 1.12.2
>
>         Attachments: flink.log
>
>
> When deploying standalone Flink on Kubernetes and configure the 
> {{high-availability.storageDir}} to a mounted PVC directory, the Flink webui 
> could not be visited normally. It shows that "Service temporarily unavailable 
> due to an ongoing leader election. Please refresh".
>  
> The following is related logs from JobManager.
> {code}
> 2020-12-29T06:45:54.177850394Z 2020-12-29 14:45:54,177 DEBUG 
> io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - 
> Leader election started
>  2020-12-29T06:45:54.177855303Z 2020-12-29 14:45:54,177 DEBUG 
> io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - 
> Attempting to acquire leader lease 'ConfigMapLock: default - 
> mta-flink-resourcemanager-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'...
>  2020-12-29T06:45:54.178668055Z 2020-12-29 14:45:54,178 DEBUG 
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - 
> WebSocket successfully opened
>  2020-12-29T06:45:54.178895963Z 2020-12-29 14:45:54,178 INFO 
> org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - 
> Starting DefaultLeaderRetrievalService with 
> KubernetesLeaderRetrievalDriver\{configMapName='mta-flink-resourcemanager-leader'}.
>  2020-12-29T06:45:54.179327491Z 2020-12-29 14:45:54,179 DEBUG 
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - 
> Connecting websocket ... 
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager@6d303498
>  2020-12-29T06:45:54.230081993Z 2020-12-29 14:45:54,229 DEBUG 
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - 
> WebSocket successfully opened
>  2020-12-29T06:45:54.230202329Z 2020-12-29 14:45:54,230 INFO 
> org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - 
> Starting DefaultLeaderRetrievalService with 
> KubernetesLeaderRetrievalDriver\{configMapName='mta-flink-dispatcher-leader'}.
>  2020-12-29T06:45:54.230219281Z 2020-12-29 14:45:54,229 DEBUG 
> io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - 
> WebSocket successfully opened
>  2020-12-29T06:45:54.230353912Z 2020-12-29 14:45:54,230 INFO 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - 
> Starting DefaultLeaderElectionService with 
> KubernetesLeaderElectionDriver\{configMapName='mta-flink-resourcemanager-leader'}.
>  2020-12-29T06:45:54.237004177Z 2020-12-29 14:45:54,236 DEBUG 
> io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - 
> Leader changed from null to 6f6479c6-86cc-4d62-84f9-37ff968bd0e5
>  2020-12-29T06:45:54.237024655Z 2020-12-29 14:45:54,236 INFO 
> org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector [] - 
> New leader elected 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 for 
> mta-flink-restserver-leader.
>  2020-12-29T06:45:54.237027811Z 2020-12-29 14:45:54,236 DEBUG 
> io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - 
> Successfully Acquired leader lease 'ConfigMapLock: default - 
> mta-flink-restserver-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'
>  2020-12-29T06:45:54.237297376Z 2020-12-29 14:45:54,237 DEBUG 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - 
> Grant leadership to contender 
> [http://mta-flink-jobmanager:8081|http://mta-flink-jobmanager:8081/] with 
> session ID 9587e13f-322f-4cd5-9fff-b4941462be0f.
>  2020-12-29T06:45:54.237353551Z 2020-12-29 14:45:54,237 INFO 
> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint [] - 
> [http://mta-flink-jobmanager:8081|http://mta-flink-jobmanager:8081/] was 
> granted leadership with leaderSessionID=9587e13f-322f-4cd5-9fff-b4941462be0f
>  2020-12-29T06:45:54.237440354Z 2020-12-29 14:45:54,237 DEBUG 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - 
> Confirm leader session ID 9587e13f-322f-4cd5-9fff-b4941462be0f for leader 
> [http://mta-flink-jobmanager:8081|http://mta-flink-jobmanager:8081/].
>  2020-12-29T06:45:54.254555127Z 2020-12-29 14:45:54,254 DEBUG 
> io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - 
> Leader changed from null to 6f6479c6-86cc-4d62-84f9-37ff968bd0e5
>  2020-12-29T06:45:54.254588299Z 2020-12-29 14:45:54,254 INFO 
> org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector [] - 
> New leader elected 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 for 
> mta-flink-resourcemanager-leader.
>  2020-12-29T06:45:54.254628053Z 2020-12-29 14:45:54,254 DEBUG 
> io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - 
> Successfully Acquired leader lease 'ConfigMapLock: default - 
> mta-flink-resourcemanager-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'
>  2020-12-29T06:45:54.254871569Z 2020-12-29 14:45:54,254 DEBUG 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - 
> Grant leadership to contender LeaderContender: StandaloneResourceManager with 
> session ID b1730dc6-0f94-49f4-b519-56917f3027b7.
>  2020-12-29T06:45:54.256608291Z 2020-12-29 14:45:54,256 DEBUG 
> io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - 
> Attempting to renew leader lease 'ConfigMapLock: default - 
> mta-flink-resourcemanager-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'...
>  2020-12-29T06:45:54.259155793Z 2020-12-29 14:45:54,258 DEBUG 
> io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - 
> Leader changed from null to 6f6479c6-86cc-4d62-84f9-37ff968bd0e5
>  2020-12-29T06:45:54.259176091Z 2020-12-29 14:45:54,258 INFO 
> org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector [] - 
> New leader elected 6f6479c6-86cc-4d62-84f9-37ff968bd0e5 for 
> mta-flink-dispatcher-leader.
>  2020-12-29T06:45:54.25918096Z 2020-12-29 14:45:54,259 DEBUG 
> io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - 
> Successfully Acquired leader lease 'ConfigMapLock: default - 
> mta-flink-dispatcher-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'
>  2020-12-29T06:45:54.259362149Z 2020-12-29 14:45:54,259 DEBUG 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - 
> Grant leadership to contender LeaderContender: DefaultDispatcherRunner with 
> session ID fbbaa883-69f6-43df-9ca0-c646bc1baad1.
>  2020-12-29T06:45:54.260301799Z 2020-12-29 14:45:54,260 DEBUG 
> org.apache.flink.runtime.dispatcher.runner.DefaultDispatcherRunner [] - 
> Create new DispatcherLeaderProcess with leader session id 
> fbbaa883-69f6-43df-9ca0-c646bc1baad1.
>  2020-12-29T06:45:54.266724597Z 2020-12-29 14:45:54,266 INFO 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] 
> - Start SessionDispatcherLeaderProcess.
>  2020-12-29T06:45:54.267718418Z 2020-12-29 14:45:54,267 DEBUG 
> io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - 
> Attempting to renew leader lease 'ConfigMapLock: default - 
> mta-flink-dispatcher-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'...
>  2020-12-29T06:45:54.26786349Z 2020-12-29 14:45:54,267 INFO 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] 
> - Recover all persisted job graphs.
>  2020-12-29T06:45:54.267976912Z 2020-12-29 14:45:54,267 DEBUG 
> org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieving all 
> stored job ids from 
> KubernetesStateHandleStore\{configMapName='mta-flink-dispatcher-leader'}.
>  2020-12-29T06:45:54.277681598Z 2020-12-29 14:45:54,277 INFO 
> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
> ResourceManager 
> akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/resourcemanager_0 was 
> granted leadership with fencing token b51956917f3027b7b1730dc60f9449f4
>  2020-12-29T06:45:54.280411279Z 2020-12-29 14:45:54,280 INFO 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl [] - 
> Starting the SlotManager.
>  2020-12-29T06:45:54.281367931Z 2020-12-29 14:45:54,281 DEBUG 
> org.apache.flink.kubernetes.highavailability.KubernetesLeaderElectionDriver 
> [] - Successfully wrote leader information: 
> Leader=[http://mta-flink-jobmanager:8081|http://mta-flink-jobmanager:8081/], 
> session ID=9587e13f-322f-4cd5-9fff-b4941462be0f.
>  2020-12-29T06:45:54.281528772Z 2020-12-29 14:45:54,281 DEBUG 
> io.fabric8.kubernetes.client.extended.leaderelection.LeaderElector [] - 
> Attempting to renew leader lease 'ConfigMapLock: default - 
> mta-flink-restserver-leader (6f6479c6-86cc-4d62-84f9-37ff968bd0e5)'...
>  2020-12-29T06:45:54.286191344Z 2020-12-29 14:45:54,286 DEBUG 
> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
> Trigger heartbeat request.
>  2020-12-29T06:45:54.286304807Z 2020-12-29 14:45:54,286 DEBUG 
> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
> Trigger heartbeat request.
>  2020-12-29T06:45:54.286438227Z 2020-12-29 14:45:54,286 DEBUG 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - 
> Confirm leader session ID b1730dc6-0f94-49f4-b519-56917f3027b7 for leader 
> akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/resourcemanager_0.
>  2020-12-29T06:45:54.309361096Z 2020-12-29 14:45:54,309 DEBUG 
> org.apache.flink.kubernetes.highavailability.KubernetesLeaderElectionDriver 
> [] - Successfully wrote leader information: 
> Leader=akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/resourcemanager_0, 
> session ID=b1730dc6-0f94-49f4-b519-56917f3027b7.
>  2020-12-29T06:45:54.320673232Z 2020-12-29 14:45:54,320 INFO 
> org.apache.flink.runtime.jobmanager.DefaultJobGraphStore [] - Retrieved job 
> ids [] from 
> KubernetesStateHandleStore\{configMapName='mta-flink-dispatcher-leader'}
>  2020-12-29T06:45:54.3206989Z 2020-12-29 14:45:54,320 INFO 
> org.apache.flink.runtime.dispatcher.runner.SessionDispatcherLeaderProcess [] 
> - Successfully recovered 0 persisted job graphs.
>  2020-12-29T06:45:54.324829616Z 2020-12-29 14:45:54,324 DEBUG 
> org.apache.flink.runtime.rpc.akka.SupervisorActor [] - Starting 
> FencedAkkaRpcActor with name dispatcher_1.
>  2020-12-29T06:45:54.325343659Z 2020-12-29 14:45:54,325 INFO 
> org.apache.flink.runtime.rpc.akka.AkkaRpcService [] - Starting RPC endpoint 
> for org.apache.flink.runtime.dispatcher.StandaloneDispatcher at 
> akka://flink/user/rpc/dispatcher_1 .
>  2020-12-29T06:45:54.33778039Z 2020-12-29 14:45:54,337 DEBUG 
> org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - 
> Confirm leader session ID fbbaa883-69f6-43df-9ca0-c646bc1baad1 for leader 
> akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/dispatcher_1.
>  2020-12-29T06:45:54.36249763Z 2020-12-29 14:45:54,362 DEBUG 
> org.apache.flink.kubernetes.highavailability.KubernetesLeaderElectionDriver 
> [] - Successfully wrote leader information: 
> Leader=akka.tcp://flink@mta-flink-jobmanager:6123/user/rpc/dispatcher_1, 
> session ID=fbbaa883-69f6-43df-9ca0-c646bc1baad1.
>  2020-12-29T06:46:04.298366262Z 2020-12-29 14:46:04,297 DEBUG 
> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
> Trigger heartbeat request.
>  2020-12-29T06:46:04.298442695Z 2020-12-29 14:46:04,298 DEBUG 
> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
> Trigger heartbeat request.
>  2020-12-29T06:46:14.318174464Z 2020-12-29 14:46:14,317 DEBUG 
> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
> Trigger heartbeat request.
>  2020-12-29T06:46:14.318256849Z 2020-12-29 14:46:14,318 DEBUG 
> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
> Trigger heartbeat request.
>  2020-12-29T06:46:24.337694477Z 2020-12-29 14:46:24,337 DEBUG 
> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
> Trigger heartbeat request.
>  2020-12-29T06:46:24.337816516Z 2020-12-29 14:46:24,337 DEBUG 
> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager [] - 
> Trigger heartbeat request.
>  2020-12-29T06:46:26.044624193Z 2020-12-29 14:46:26,044 DEBUG 
> org.apache.flink.shaded.netty4.io.netty.buffer.AbstractByteBuf [] - 
> -Dorg.apache.flink.shaded.netty4.io.netty.buffer.checkAccessible: true
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to