[
https://issues.apache.org/jira/browse/FLINK-21472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17295887#comment-17295887
]
Peng Zhang commented on FLINK-21472:
------------------------------------
[~fly_in_gis] Thanks! I will try Flink 1.12.2 once it is available in docker.
For more information, in our case the FencingTokenException happened when a
JobManager is redeployed to another node by K8S. And the new JobManager cannot
start the jobs from checkpoints
{{2021-03-04 17:04:44,928 INFO
org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] -
Recovering checkpoints from
KubernetesStateHandleStore\{configMapName='stellar-flink-cluster-8ea8bb860bdefc3884cd586f4473295a-jobmanager-leader'}.
2021-03-04 17:04:44,928 INFO
org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] -
Recovering checkpoints from
KubernetesStateHandleStore\{configMapName='stellar-flink-cluster-8ea8bb860bdefc3884cd586f4473295a-jobmanager-leader'}.
2021-03-04 17:04:44,933 INFO
org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] - Found
1 checkpoints in
KubernetesStateHandleStore\{configMapName='stellar-flink-cluster-8ea8bb860bdefc3884cd586f4473295a-jobmanager-leader'}.
2021-03-04 17:04:44,933 INFO
org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] - Trying
to fetch 1 checkpoints from storage. 2021-03-04 17:04:44,933 INFO
org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] - Trying
to retrieve checkpoint 18. 2021-03-04 17:04:44,963 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Restoring job
8ea8bb860bdefc3884cd586f4473295a from Checkpoint 18 @ 1614877356663 for
8ea8bb860bdefc3884cd586f4473295a located at
s3a://zalando-stellar-flink-state-eu-central-1-staging/checkpoints/8ea8bb860bdefc3884cd586f4473295a/chk-18.
2021-03-04 17:04:44,964 INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - No master
state to restore 2021-03-04 17:04:44,965 INFO
org.apache.flink.runtime.jobmaster.JobMaster [] - Using
failover strategy
org.apache.flink.runtime.executiongraph.failover.flip1.RestartPipelinedRegionFailoverStrategy@530feb4d
for BrandCollectionTrackingJob (8ea8bb860bdefc3884cd586f4473295a). 2021-03-04
17:04:44,970 INFO org.apache.flink.runtime.jobmaster.JobManagerRunnerImpl
[] - JobManager runner for job BrandCollectionTrackingJob
(8ea8bb860bdefc3884cd586f4473295a) was granted leadership with session id
ecb717f4-089f-48af-8d82-63333f7d4b17 at
akka.tcp://flink@stellar-flink-jobmanager:6123/user/rpc/jobmanager_4.
2021-03-04 17:05:09,618 WARN
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager [] - Exec
Failure java.net.SocketTimeoutException: sent ping but didn't receive pong
within 30000ms (after 1 successful ping/pongs) 2021-03-04 17:05:14,990 ERROR
org.apache.flink.runtime.rest.handler.job.JobsOverviewHandler [] - Unhandled
exception. org.apache.flink.runtime.rpc.exceptions.FencingTokenException:
Fencing token mismatch: Ignoring message
LocalFencedMessage(9c31a87cf2ff475d049819f3fb9e4cd7,
LocalRpcInvocation(requestMultipleJobDetails(Time))) because the fencing token
9c31a87cf2ff475d049819f3fb9e4cd7 did not match the expected fencing token
bbc60d6ee1cc9717561f755149454d94.}}
> FencingTokenException: Fencing token mismatch
> ---------------------------------------------
>
> Key: FLINK-21472
> URL: https://issues.apache.org/jira/browse/FLINK-21472
> Project: Flink
> Issue Type: Bug
> Components: Deployment / Kubernetes
> Affects Versions: 1.12.1
> Reporter: hayden zhou
> Priority: Major
> Attachments:
> flink--standalonesession-0-mta-flink-jobmanager-864d6c8cbb-rmsxw.log
>
>
> org.apache.flink.runtime.rest.handler.job.JobsOverviewHandler [] - Unhandled
> exception.
> org.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fencing token
> mismatch: Ignoring message
> LocalFencedMessage(8fac01d8e3e3988223a2e5c6e3f04f1e,
> LocalRpcInvocation(requestMultipleJobDetails(Time))) because the fencing
> token 8fac01d8e3e3988223a2e5c6e3f04f1e did not match the expected fencing
> token 8c37414f464bca76144e6cabc946474b.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)