morhook commented on issue #18041: URL: https://github.com/apache/airflow/issues/18041#issuecomment-956541946
On our cluster we suspect it is related with running out of disk space on the "node" level of Kubernetes. https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/ As `KubernetesExecutor` is creating raw `Pods` (instead of `Deployments` or `Jobs`), this type of "destroys" (evictions) is producing that DAG/Tasks are mark as failed on airflow side and Kubernetes is not re-trying them. Sidenote: we are using airflow 1.10.7, but seems the problem is the same on 2.x versions of airflow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
