[
https://issues.apache.org/jira/browse/AIRFLOW-6040?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16980186#comment-16980186
]
Max edited comment on AIRFLOW-6040 at 11/22/19 7:15 PM:
--------------------------------------------------------
We ran into this same issue. I believe this is actually an issue in the
upstream [kubernetes|[https://github.com/kubernetes-client/python]] package and
not Airflow.
The exception is thrown from [this
loop|https://github.com/apache/airflow/blob/1.10.6/airflow/contrib/executors/kubernetes_executor.py#L356].
It passes: {{label_selector="airflow-worker=<uuid>"}} to the
{{list_namespaced_pod()}} method. When used in a {{Watch()}}, this doesn't
return anything when there are no Pods that match the given UUID. The
{{_request_timeout}} [config
setting|https://github.com/apache/airflow/blob/1.10.6/airflow/config_templates/default_airflow.cfg#L828]
causes the underlying {{urllib3}} library to throw a timeout exception which
is unhandled by {{Watch()}}.
You can easily reproduce this by running a simple Python pod (in your Airflow
namespace so it has the same ServiceAccount permissions) and executing the
following snippet:
{code:bash}
$ kubectl -n <your-namespace> run -i -t python
--image=python:3.7.4-slim-stretch --restart=Never --command -- /bin/sh
# pip install kubernetes
# python
>>> from kubernetes import config, client, watch
>>> from kubernetes.client.rest import ApiException
>>> config.load_incluster_config()
>>> k8s = client.CoreV1Api()
>>> watcher = watch.Watch()
>>> namespace = "<your-namespace>"
>>> for event in watcher.stream(k8s.list_namespaced_pod, namespace,
>>> resource_version="0", label_selector="airflow-worker=dont-find-this",
>>> _request_timeout=(60, 60)):
>>> print(event['object'])
{code}
I've observed this behavior in both Airflow 1.10.5 & 1.10.6, Python 2.7 &
Python 3.7, K8s 1.15 & K8s 1.16, urllib3 1.24 & urllib3 1.25.
Setting {{timeout_seconds=50}} in the [Watch()
loop|https://github.com/apache/airflow/blob/1.10.6/airflow/contrib/executors/kubernetes_executor.py#L356]
will cause a warning instead of an exception. {{timeout_seconds}} targets the
[list_namespaced_pod|https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/CoreV1Api.md#list_namespaced_pod]
method as opposed to the underlying urllib3 library.
Hope this helps others that are facing this issues.
was (Author: fl-max):
We ran into this same issue. I believe this is actually an issue in the
upstream [kubernetes|[https://github.com/kubernetes-client/python]] package and
not Airflow.
The exception is thrown from [this
loop|https://github.com/apache/airflow/blob/1.10.6/airflow/contrib/executors/kubernetes_executor.py#L356].
It passes: {{label_selector="airflow-worker=<uuid>"}} to the
{{list_namespaced_pod()}} method. When used in a {{Watch()}}, this doesn't
return anything when there are no Pods that match the given UUID. The
{{_request_timeout}} [config
setting|https://github.com/apache/airflow/blob/1.10.6/airflow/config_templates/default_airflow.cfg#L828]
causes the underlying {{urllib3}} library to throw a timeout exception which
is unhandled by {{Watch()}}.
You can easily reproduce this by running a simple Python pod (in your Airflow
namespace so it has the same ServiceAccount permissions) and executing the
following snippet:
{code:bash}
$ kubectl -n <your-namespace> run -i -t python
--image=python:3.7.4-slim-stretch --restart=Never --command -- /bin/sh
# pip install kubernetes
# python
>>> from kubernetes import config, client, watch
>>> from kubernetes.client.rest import ApiException
>>> config.load_incluster_config()
>>> k8s = client.CoreV1Api()
>>> watcher = watch.Watch()
>>> namespace = "<your-namespace>"
>>> for event in watcher.stream(k8s.list_namespaced_pod, namespace,
>>> resource_version="0", label_selector="airflow-worker=dont-find-this",
>>> _request_timeout=(60, 60)):
>>> print(event['object'])
{code}
I've observed this behavior in both Airflow 1.10.5 & 1.10.6, Python 2.7 &
Python 3.7, K8s 1.15 & K8s 1.16, urllib3 1.24 & urllib3 1.25.
As a workaround, setting
[kube_client_request_args|https://github.com/apache/airflow/blob/1.10.6/airflow/config_templates/default_airflow.cfg#L828]
to:
{noformat}
"{ \"_request_timeout\" : [60,60], \"timeout_seconds\" : 50 }"
{noformat}
will cause a warning instead of an exception. {{timeout_seconds}} targets the
[list_namespaced_pod|https://github.com/kubernetes-client/python/blob/master/kubernetes/docs/CoreV1Api.md#list_namespaced_pod]
method as opposed to the underlying urllib3 library.
Hope this helps others that are facing this issues.
> Airflow scheduler with kubernetes executor fails :- Unknown error in
> KubernetesJobWatcher
> -----------------------------------------------------------------------------------------
>
> Key: AIRFLOW-6040
> URL: https://issues.apache.org/jira/browse/AIRFLOW-6040
> Project: Apache Airflow
> Issue Type: Bug
> Components: contrib, executor-kubernetes, scheduler
> Affects Versions: 1.10.6
> Reporter: Ashutosh Srivastava
> Assignee: Daniel Imberman
> Priority: Major
>
> I am trying to set up airflow with the kubernetes executor. I have cloned
> airflow 1.10.6 and am building the docker image and then deploying it with
> kube. The pods are running, the service airflow also starts. The webserver is
> working fine. But when I check the logs for the scheduler I get the following
> error.
>
> {{ERROR - Error while health checking kube watcher process. Process died for
> unknown reasons
> INFO - Event: and now my watch begins starting at resource_version: 0
> ERROR - Unknown error in KubernetesJobWatcher. Failing
> Traceback (most recent call last):
> File
> "/usr/local/lib/python2.7/dist-packages/airflow/contrib/executors/kubernetes_executor.py",
> line 333, in run
> self.worker_uuid, self.kube_config)
> File
> "/usr/local/lib/python2.7/dist-packages/airflow/contrib/executors/kubernetes_executor.py",
> line 358, in _run
> **kwargs):
> File "/usr/local/lib/python2.7/dist-packages/kubernetes/watch/watch.py",
> line 144, in stream
> for line in iter_resp_lines(resp):
> File "/usr/local/lib/python2.7/dist-packages/kubernetes/watch/watch.py",
> line 48, in iter_resp_lines
> for seg in resp.read_chunked(decode_content=False):
> File "/usr/local/lib/python2.7/dist-packages/urllib3/response.py", line
> 781, in read_chunked
> self._original_response.close()
> File "/usr/lib/python2.7/contextlib.py", line 35, in __exit__
> self.gen.throw(type, value, traceback)
> File "/usr/local/lib/python2.7/dist-packages/urllib3/response.py", line
> 439, in _error_catcher
> raise ReadTimeoutError(self._pool, None, "Read timed out.")
> ReadTimeoutError: HTTPSConnectionPool(host='10.0.0.1', port=443): Read timed
> out.}}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)