[
https://issues.apache.org/jira/browse/AIRFLOW-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17062221#comment-17062221
]
ASF subversion and git services commented on AIRFLOW-6014:
----------------------------------------------------------
Commit 2ae99e145374655c87068bce48e91f07a6567242 in airflow's branch
refs/heads/v1-10-test from atrbgithub
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=2ae99e1 ]
[AIRFLOW-6014] Handle pods which are preempted & deleted by kubernetes but not
restarted (#6606)
cherry-picked from 4e626be3c90d76fac7ffc3a6b5c6fed10753fd38
> Kubernetes executor - handle preempted deleted pods - queued tasks
> ------------------------------------------------------------------
>
> Key: AIRFLOW-6014
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6014
> Project: Apache Airflow
> Issue Type: Improvement
> Components: executor-kubernetes
> Affects Versions: 1.10.6
> Reporter: afusr
> Assignee: Daniel Imberman
> Priority: Minor
> Fix For: 1.10.10
>
>
> We have encountered an issue whereby when using the kubernetes executor, and
> using autoscaling, airflow pods are preempted and airflow never attempts to
> rerun these pods.
> This is partly as a result of having the following set on the pod spec:
> restartPolicy: Never
> This makes sense as if a pod fails when running a task, we don't want
> kubernetes to retry it, as this should be controlled by airflow.
> What we believe happens is that when a new node is added by autoscaling,
> kubernetes schedules a number of airflow pods onto the new node, as well as
> any pods required by k8s/daemon sets. As these are higher priority, the
> Airflow pods are preempted, and deleted. You see messages such as:
>
> Preempted by kube-system/ip-masq-agent-xz77q on node
> gke-some--airflow-00000000-node-1ltl
>
> Within the kubernetes executor, these pods end up in a status of pending and
> an event of deleted is received but not handled.
> The end result is tasks remain in a queued state forever.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)