[
https://issues.apache.org/jira/browse/AIRFLOW-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16996849#comment-16996849
]
Gaurav Sehgal edited comment on AIRFLOW-4424 at 12/15/19 7:03 PM:
------------------------------------------------------------------
Hi, At GoJek, we are facing the same issue with the local executor. Here's the
thread dump.
```
ThreadID: 140356901611264
File: "/usr/local/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File: "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File: "<string>", line 167, in run
File: "/usr/local/lib/python3.7/code.py", line 232, in interact
more = self.push(line)
File: "/usr/local/lib/python3.7/code.py", line 258, in push
more = self.runsource(source, self.filename)
File: "/usr/local/lib/python3.7/code.py", line 74, in runsource
self.runcode(code)
File: "/usr/local/lib/python3.7/code.py", line 90, in runcode
exec(code, self.locals)
File: "<console>", line 3, in <module>
ThreadID: 140358376056576
File: "/usr/local/bin/airflow", line 37, in <module>
args.func(args)
File: "/usr/local/lib/python3.7/site-packages/airflow/utils/cli.py", line 74,
in wrapper
return f(*args, **kwargs)
File: "/usr/local/lib/python3.7/site-packages/airflow/bin/cli.py", line 1042,
in scheduler
job.run()
File: "/usr/local/lib/python3.7/site-packages/airflow/jobs/base_job.py", line
222, in run
self._execute()
File: "/usr/local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py",
line 1356, in _execute
self._execute_helper()
File: "/usr/local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py",
line 1496, in _execute_helper
self.executor.end()
File:
"/usr/local/lib/python3.7/site-packages/airflow/executors/local_executor.py",
line 233, in end
self.impl.end()
File:
"/usr/local/lib/python3.7/site-packages/airflow/executors/local_executor.py",
line 212, in end
self.queue.join()
File: "<string>", line 2, in join
File: "/usr/local/lib/python3.7/multiprocessing/managers.py", line 819, in
_callmethod
kind, result = conn.recv()
File: "/usr/local/lib/python3.7/multiprocessing/connection.py", line 250, in
recv
buf = self._recv_bytes()
File: "/usr/local/lib/python3.7/multiprocessing/connection.py", line 407, in
_recv_bytes
buf = self._recv(4)
File: "/usr/local/lib/python3.7/multiprocessing/connection.py", line 379, in
_recv
chunk = read(handle, remaining)
```
was (Author: gaurav123):
Hi, At GoJek, we are facing the same issue with the local executor. Here's the
thread dump.
```
# ThreadID: 140356901611264
File: "/usr/local/lib/python3.7/threading.py", line 890, in _bootstrap
self._bootstrap_inner()
File: "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File: "<string>", line 167, in run
File: "/usr/local/lib/python3.7/code.py", line 232, in interact
more = self.push(line)
File: "/usr/local/lib/python3.7/code.py", line 258, in push
more = self.runsource(source, self.filename)
File: "/usr/local/lib/python3.7/code.py", line 74, in runsource
self.runcode(code)
File: "/usr/local/lib/python3.7/code.py", line 90, in runcode
exec(code, self.locals)
File: "<console>", line 3, in <module>
# ThreadID: 140358376056576
File: "/usr/local/bin/airflow", line 37, in <module>
args.func(args)
File: "/usr/local/lib/python3.7/site-packages/airflow/utils/cli.py", line 74,
in wrapper
return f(*args, **kwargs)
File: "/usr/local/lib/python3.7/site-packages/airflow/bin/cli.py", line 1042,
in scheduler
job.run()
File: "/usr/local/lib/python3.7/site-packages/airflow/jobs/base_job.py", line
222, in run
self._execute()
File: "/usr/local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py",
line 1356, in _execute
self._execute_helper()
File: "/usr/local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py",
line 1496, in _execute_helper
self.executor.end()
File:
"/usr/local/lib/python3.7/site-packages/airflow/executors/local_executor.py",
line 233, in end
self.impl.end()
File:
"/usr/local/lib/python3.7/site-packages/airflow/executors/local_executor.py",
line 212, in end
self.queue.join()
File: "<string>", line 2, in join
File: "/usr/local/lib/python3.7/multiprocessing/managers.py", line 819, in
_callmethod
kind, result = conn.recv()
File: "/usr/local/lib/python3.7/multiprocessing/connection.py", line 250, in
recv
buf = self._recv_bytes()
File: "/usr/local/lib/python3.7/multiprocessing/connection.py", line 407, in
_recv_bytes
buf = self._recv(4)
File: "/usr/local/lib/python3.7/multiprocessing/connection.py", line 379, in
_recv
chunk = read(handle, remaining)
```
> Scheduler does not terminate after num_runs when executor is
> KubernetesExecutor
> -------------------------------------------------------------------------------
>
> Key: AIRFLOW-4424
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4424
> Project: Apache Airflow
> Issue Type: Bug
> Components: executors, scheduler
> Affects Versions: 1.10.3
> Environment: EKS, deployed with stable airflow helm chart
> Reporter: Brian Nutt
> Priority: Blocker
> Labels: kubernetes
> Fix For: 2.0.0
>
>
> When using the executor like the CeleryExecutor and num_runs is set on the
> scheduler, the scheduler pod restarts after num runs have completed. After
> switching to KubernetesExecutor, the scheduler logs:
> [2019-04-26 19:20:43,562] \{{kubernetes_executor.py:770}} INFO - Shutting
> down Kubernetes executor
> However, the scheduler process does not complete. This leads to the scheduler
> pod never restarting and running num_runs again. Resulted in having to roll
> back to CeleryExecutor because if num_runs is -1, the scheduler builds up
> tons of defunct processes, which is eventually making tasks not able to be
> scheduled as the underlying nodes have run out of file descriptors.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)