[
https://issues.apache.org/jira/browse/AIRFLOW-5447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16930644#comment-16930644
]
Chris Wegrzyn commented on AIRFLOW-5447:
----------------------------------------
After a bit of wrestling with pyrasite and probably dumb luck, I managed to get
what appears to be a telling stack trace:
{code:java}
Thread 0x7fb39d56d700
File "/usr/local/airflow/.local/bin/airflow", line 32, in <module>
args.func(args)
File
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/utils/cli.py",
line 74, in wrapper
return f(*args, **kwargs)
File
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/bin/cli.py",
line 1013, in scheduler
job.run()
File
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/jobs/base_job.py",
line 213, in run
self._execute()
File
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py",
line 1350, in _execute
self._execute_helper()
File
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py",
line 1439, in _execute_helper
self.executor.heartbeat()
File
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/executors/base_executor.py",
line 132, in heartbeat
self.trigger_tasks(open_slots)
File
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/executors/base_executor.py",
line 156, in trigger_tasks
executor_config=simple_ti.executor_config)
File
"/usr/local/airflow/.local/lib/python3.7/site-packages/airflow/contrib/executors/kubernetes_executor.py",
line 767, in execute_async
self.task_queue.put((key, command, kube_executor_config))
File "<string>", line 2, in put
File "/usr/local/lib/python3.7/multiprocessing/managers.py", line 819, in
_callmethod
kind, result = conn.recv()
File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 250, in
recv
buf = self._recv_bytes()
File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 407, in
_recv_bytes
buf = self._recv(4)
File "/usr/local/lib/python3.7/multiprocessing/connection.py", line 379, in
_recv
chunk = read(handle, remaining)
File "<string>", line 1, in <module>
File "<string>", line 5, in <module>{code}
It does seem like something is going wrong with the communication related to
the put to the task_queue.
> KubernetesExecutor hangs on task queueing
> -----------------------------------------
>
> Key: AIRFLOW-5447
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5447
> Project: Apache Airflow
> Issue Type: Bug
> Components: executor-kubernetes
> Affects Versions: 1.10.4, 1.10.5
> Environment: Kubernetes version v1.14.3, Airflow version 1.10.4-1.10.5
> Reporter: Henry Cohen
> Assignee: Daniel Imberman
> Priority: Blocker
>
> Starting in 1.10.4, and continuing in 1.10.5, when using the
> KubernetesExecutor, with the webserver and scheduler running in the
> kubernetes cluster, tasks are scheduled, but when added to the task queue,
> the executor process hangs indefinitely. Based on log messages, it appears to
> be stuck at this line
> https://github.com/apache/airflow/blob/v1-10-stable/airflow/contrib/executors/kubernetes_executor.py#L761
--
This message was sent by Atlassian Jira
(v8.3.2#803003)