[
https://issues.apache.org/jira/browse/AIRFLOW-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ash Berlin-Taylor resolved AIRFLOW-2966.
----------------------------------------
Resolution: Fixed
> KubernetesExecutor + namespace quotas kills scheduler if the pod can't be
> launched
> ----------------------------------------------------------------------------------
>
> Key: AIRFLOW-2966
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2966
> Project: Apache Airflow
> Issue Type: Bug
> Components: scheduler
> Affects Versions: 2.0.0
> Environment: Kubernetes 1.9.8
> Reporter: John Hofman
> Assignee: John Hofman
> Priority: Major
> Fix For: 2.0.0
>
>
> When running Airflow in Kubernetes with the KubernetesExecutor and resource
> quota's set on the namespace Airflow is deployed in. If the scheduler tries
> to launch a pod into the namespace that exceeds the namespace limits it gets
> an ApiException, and crashes the scheduler.
> This stack trace is an example of the ApiException from the kubernetes client:
> {code:java}
> [2018-08-27 09:51:08,516] {pod_launcher.py:58} ERROR - Exception when
> attempting to create Namespaced Pod.
> Traceback (most recent call last):
> File "/src/apache-airflow/airflow/contrib/kubernetes/pod_launcher.py", line
> 55, in run_pod_async
> resp = self._client.create_namespaced_pod(body=req, namespace=pod.namespace)
> File
> "/usr/local/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py",
> line 6057, in create_namespaced_pod
> (data) = self.create_namespaced_pod_with_http_info(namespace, body, **kwargs)
> File
> "/usr/local/lib/python3.6/site-packages/kubernetes/client/apis/core_v1_api.py",
> line 6142, in create_namespaced_pod_with_http_info
> collection_formats=collection_formats)
> File
> "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py",
> line 321, in call_api
> _return_http_data_only, collection_formats, _preload_content,
> _request_timeout)
> File
> "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py",
> line 155, in __call_api
> _request_timeout=_request_timeout)
> File
> "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py",
> line 364, in request
> body=body)
> File "/usr/local/lib/python3.6/site-packages/kubernetes/client/rest.py", line
> 266, in POST
> body=body)
> File "/usr/local/lib/python3.6/site-packages/kubernetes/client/rest.py", line
> 222, in request
> raise ApiException(http_resp=r)
> kubernetes.client.rest.ApiException: (403)
> Reason: Forbidden
> HTTP response headers: HTTPHeaderDict({'Audit-Id':
> 'b00e2cbb-bdb2-41f3-8090-824aee79448c', 'Content-Type': 'application/json',
> 'Date': 'Mon, 27 Aug 2018 09:51:08 GMT', 'Content-Length': '410'})
> HTTP response body:
> {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods
> \"podname-ec366e89ef934d91b2d3ffe96234a725\" is forbidden: exceeded quota:
> compute-resources, requested: limits.memory=4Gi, used: limits.memory=6508Mi,
> limited:
> limits.memory=10Gi","reason":"Forbidden","details":{"name":"podname-ec366e89ef934d91b2d3ffe96234a725","kind":"pods"},"code":403}{code}
>
> I would expect the scheduler to catch the Exception and at least mark the
> task as failed, or better yet retry the task later.
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)