[
https://issues.apache.org/jira/browse/FLINK-21432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289224#comment-17289224
]
Bhagi commented on FLINK-21432:
-------------------------------
Hi Yang,
I deployed standalone flink(12.0 version) cluster on kubernetes. increased the
job manager pod replicas count=3 and 2 tasks manager pods then i have
configured HA with kubernetes. But after HA configured UI is not properly
working.
+*Flink config yaml:*+
flink-conf.yaml: |+
jobmanager.rpc.address: flink-jobmanager
taskmanager.numberOfTaskSlots: 2
blob.server.port: 6124
jobmanager.rpc.port: 6123
taskmanager.rpc.port: 6122
queryable-state.proxy.ports: 6125
jobmanager.memory.process.size: 1600m
taskmanager.memory.process.size: 1728m
parallelism.default: 2
web.log.path: /opt/flink/log/output.log
taskmanager.log.path: /opt/flink/log/output.log
state.backend: rocksdb
state.checkpoints.dir: file:///persistent/flinkData/checkpoints
state.backend.rocksdb.log.dir: /persistent/flinkData/rocksdb/logging/
state.savepoints.dir: file:///persistent/flinkData/savepoints
state.backend.incremental: true
state.checkpoints.num-retained: 1
web.upload.dir: /persistent/flinkData
classloader.resolve-order: parent-first
kubernetes.cluster-id: 111
high-availability:
org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
high-availability.storageDir: file:///persistent/flinkData/checkpoints
security.ssl.rest.enabled: true
security.ssl.rest.keystore: /persistent/flinkData/ssl_certs/flink-keystore.jks
security.ssl.rest.truststore:
/persistent/flinkData/ssl_certs/flink-truststore.jks
security.ssl.rest.keystore-password: admin123
security.ssl.rest.key-password: admin123
security.ssl.rest.truststore-password: admin123
security.ssl.verify-hostname: false
2) Created configmaps for leader election and retrieval
*KubernetesLeaderElectionService.jave*
|1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16|{{private}} {{final}} {{LeaderElector leaderElector = kc.leaderElector()}}
{{ }}{{.withConfig(}}
{{ }}{{new}} {{LeaderElectionConfigBuilder()}}
{{ }}{{.withName(leaderKey)}}
{{ }}{{.withLeaseDuration(Duration.ofSeconds(15L))}}
{{ }}{{.withLock(}}{{new}} {{ConfigMapLock(ns, leaseName, lockIdentity))}}
{{ }}{{.withRenewDeadline(Duration.ofSeconds(10L))}}
{{ }}{{.withRetryPeriod(Duration.ofSeconds(2L))}}
{{ }}{{.withLeaderCallbacks(}}{{new}} {{LeaderCallbacks(}}
{{ }}{{this}}{{::isLeader,}}
{{ }}{{this}}{{::notLeader,}}
{{ }}{{newLeader -> LOG.info(}}{{"New leader elected {}."}}{{, newLeader)}}
{{ }}{{))}}
{{ }}{{.build())}}
{{ }}{{.build();}}
{{ }}{{executor.execute(leaderElector::run);}}|
h2. LeaderRetrieval
We could create a watcher for the ConfigMap and get the leader address in the
callback handler.
{panel:title=KubernetesLeaderRetrievalService.jave}
{panel}
|1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22|{{kubeClient.configMaps().withName(cm).watch(}}{{new}}
{{Watcher<ConfigMap>() {}}
{{ }}{{@Override}}
{{ }}{{public}} {{void}} {{eventReceived(Action action, ConfigMap resource)
{}}
{{ }}{{final}} {{String name = resource.getMetadata().getName();}}
{{ }}{{switch}} {{(action) {}}
{{ }}{{case}} {{ADDED:}}
{{ }}{{case}} {{MODIFIED:}}
{{ }}{{if}} {{(resource.getData() != }}{{null}}{{) {}}
{{ }}{{// TODO a new leader has been elected}}
{{ }}{{}}}
{{ }}{{break}}{{;}}
{{ }}{{case}} {{DELETED:}}
{{ }}{{listener.handleError(}}{{new}} {{Exception(}}{{"Deleted while
watching the configMap "}} {{+ name));}}
{{ }}{{break}}{{;}}
{{ }}{{case}} {{ERROR:}}
{{ }}{{listener.handleError(}}{{new}} {{Exception(}}{{"Error while
watching the configMap "}} {{+ name));}}
{{ }}{{break}}{{;}}
{{ }}{{default}}{{:}}
{{ }}{{LOG.debug(}}{{"Ignore handling {} event for configMap {}"}}{{,
action, resource.getMetadata().getName());}}
{{ }}{{break}}{{;}}
{{ }}{{}}}
{{}}}|
3) Jobmanager logs, shows leader election
!image-2021-02-23-23-12-03-118.png!
4) Flink UI is displaying this error information
!image-2021-02-23-23-14-13-708.png!
5) i have doubt, why leader id is showing differently for restserver, resource
manager & dispatcher.is this leader information is correct ??
New leader elected 65d77de7-59c8-4d6e-a321-85c102de0d51 for
111-restserver-leader.
New leader elected fc1a1d6c-6d39-4fae-8a99-a37afcd9b1c0 for
111-resourcemanager-leader
New leader elected 2ddfff94-c165-4e68-b80a-fd1ca59b39a7 for
111-dispatcher-leader
> Web UI -- Error - {"errors":["Service temporarily unavailable due to an
> ongoing leader election. Please refresh."]}
> -------------------------------------------------------------------------------------------------------------------
>
> Key: FLINK-21432
> URL: https://issues.apache.org/jira/browse/FLINK-21432
> Project: Flink
> Issue Type: Bug
> Components: Deployment / Kubernetes
> Affects Versions: 1.12.0
> Environment: debian
> Reporter: Bhagi
> Priority: Critical
> Fix For: 1.12.0
>
> Attachments: image-2021-02-22-10-39-06-180.png,
> image-2021-02-23-23-12-03-118.png, image-2021-02-23-23-14-13-708.png
>
>
> Web UI – throwing this Error
> {"errors":["Service temporarily unavailable due to an ongoing leader
> election. Please refresh."]}
> Please find Job Manager logs.
>
> !image-2021-02-22-10-39-06-180.png!
--
This message was sent by Atlassian Jira
(v8.3.4#803005)