[
https://issues.apache.org/jira/browse/FLINK-15642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17078996#comment-17078996
]
Yang Wang commented on FLINK-15642:
-----------------------------------
[~felixzheng] The reason why i want to add this feature is to avoid the
jobmanager hang and there is no response from it for too long time.
For YARN deployment, the YARN resourcemanager is responsible for the liveness
of the jobmanager. When it does not heartbeat for a while(default is 600s), it
will be killed and a new jobmanager will be started. So i am thinking to add
this feature into K8s by using liveness check.
Also the readiness could help us to verify whether the session cluster is ready
for accepting Flink jobs.
> Support to set JobManager readiness and liveness check
> ------------------------------------------------------
>
> Key: FLINK-15642
> URL: https://issues.apache.org/jira/browse/FLINK-15642
> Project: Flink
> Issue Type: Sub-task
> Components: Deployment / Kubernetes
> Reporter: Yang Wang
> Priority: Major
>
> The liveness of TaskManager will be controlled by Flink Master. When it
> failed, timeout, a new pod will be started to replace. We need to add a
> liveness check for JobManager.
>
> It just like what we could do in the yaml.
> {code:java}
> ...
> livenessProbe:
> tcpSocket:
> port: 6123
> initialDelaySeconds: 30
> periodSeconds: 60
> ...{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)