[
https://issues.apache.org/jira/browse/FLINK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
An resolved FLINK-18367.
------------------------
Resolution: Not A Bug
> Flink HA Mode in Kubernetes. Fencing token not set
> --------------------------------------------------
>
> Key: FLINK-18367
> URL: https://issues.apache.org/jira/browse/FLINK-18367
> Project: Flink
> Issue Type: Bug
> Components: Runtime / Coordination
> Affects Versions: 1.10.1
> Reporter: An
> Priority: Critical
> Attachments: job_manager_old_leader.log, new_job_manager_leader.log,
> taskmanager.log, tm1.log, tm2.log
>
>
> The issue is similar to https://issues.apache.org/jira/browse/FLINK-12382
> I'm testing zetcd + session jobs in k8s. Have 2 job managers and 2
> taskmanagers. Everything works fine, but after I delete the pod with the job
> manager leader, task managers not always can register itself at the new
> leader. The following exception occurs:
> {code:java}
> 2020-06-18 13:02:43,555 [Thread=flink-akka.actor.default-dispatcher-3] ERROR
> org.apache.flink.runtime.taskexecutor.TaskExecutor - Registration at
> ResourceManager failed due to an error
> java.util.concurrent.CompletionException:
> org.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fencing token
> not set: Ignoring message
> RemoteFencedMessage(bcb7d4652fe53a2f8997dc8c87d641a7,
> RemoteRpcInvocation(registerTaskExecutor(TaskExecutorRegistration, Time)))
> sent to
> akka.tcp://flink@poc-ha-walle-flink-jobmanager:50010/user/resourcemanager
> because the fencing token is null.
> at
> java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
> at
> java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
> at
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607)
> at
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
> at
> java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
> at
> java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
> {code}
> Task managers receive notification that leader was changed but seems
> RpcEndpoint can't refresh fence token for some reason
>
> Attached full log from the task manager pod
--
This message was sent by Atlassian Jira
(v8.3.4#803005)