[
https://issues.apache.org/jira/browse/FLINK-16288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17045578#comment-17045578
]
Niels Basjes commented on FLINK-16288:
--------------------------------------
Yes, thank you.
That does what I was looking for.
I started my cluster with this and now the task managers have remained active
for much more than the default 30 seconds.
{code:java}
./flink-1.10.0/bin/kubernetes-session.sh \
-Dkubernetes.cluster-id=flink1100 \
-Dtaskmanager.memory.process.size=8192m \
-Dkubernetes.taskmanager.cpu=2 \
-Dtaskmanager.numberOfTaskSlots=8 \
-Dresourcemanager.taskmanager-timeout=3600000
{code}
I've put up a small pull request to extend the documentation a bit so other can
find this easier.
https://github.com/apache/flink/pull/11226
> Setting the TTL for discarding task pods on Kubernetes.
> -------------------------------------------------------
>
> Key: FLINK-16288
> URL: https://issues.apache.org/jira/browse/FLINK-16288
> Project: Flink
> Issue Type: Improvement
> Components: Deployment / Kubernetes
> Affects Versions: 1.10.0
> Reporter: Niels Basjes
> Priority: Major
> Labels: pull-request-available
> Time Spent: 10m
> Remaining Estimate: 0h
>
> I'm experimenting with running Flink 1.10.0 on native Kubernetes (version
> 1.17).
> After a job ends the task pods that were used to run it are discarded quite
> quickly.
> I found that if my job goes wrong I have too little time to look at all of
> the logs.
> I propose having a new config setting that allows me to run Flink on k8s
> where I can set the minimum time before an idle task pod is discarded.
> That way I can start Flink with a pod ttl of an hour (or something like that)
> so I have enough time to go through the logs and figure out what I did wrong.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)