[ 
https://issues.apache.org/jira/browse/AIRFLOW-4424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16996849#comment-16996849
 ] 

Gaurav Sehgal edited comment on AIRFLOW-4424 at 12/15/19 7:04 PM:
------------------------------------------------------------------

Hi, At GoJek, we are facing the same issue with the local executor. Here's the 
thread dump. 
~~
{code:java}
ThreadID: 140356901611264
 File: "/usr/local/lib/python3.7/threading.py", line 890, in _bootstrap
 self._bootstrap_inner()
 File: "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
 self.run()
 File: "<string>", line 167, in run
 File: "/usr/local/lib/python3.7/code.py", line 232, in interact
 more = self.push(line)
 File: "/usr/local/lib/python3.7/code.py", line 258, in push
 more = self.runsource(source, self.filename)
 File: "/usr/local/lib/python3.7/code.py", line 74, in runsource
 self.runcode(code)
 File: "/usr/local/lib/python3.7/code.py", line 90, in runcode
 exec(code, self.locals)
 File: "<console>", line 3, in <module>

ThreadID: 140358376056576
 File: "/usr/local/bin/airflow", line 37, in <module>
 args.func(args)
 File: "/usr/local/lib/python3.7/site-packages/airflow/utils/cli.py", line 74, 
in wrapper
 return f(*args, **kwargs)
 File: "/usr/local/lib/python3.7/site-packages/airflow/bin/cli.py", line 1042, 
in scheduler
 job.run()
 File: "/usr/local/lib/python3.7/site-packages/airflow/jobs/base_job.py", line 
222, in run
 self._execute()
 File: "/usr/local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", 
line 1356, in _execute
 self._execute_helper()
 File: "/usr/local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", 
line 1496, in _execute_helper
 self.executor.end()
 File: 
"/usr/local/lib/python3.7/site-packages/airflow/executors/local_executor.py", 
line 233, in end
 self.impl.end()
 File: 
"/usr/local/lib/python3.7/site-packages/airflow/executors/local_executor.py", 
line 212, in end
 self.queue.join()
 File: "<string>", line 2, in join
 File: "/usr/local/lib/python3.7/multiprocessing/managers.py", line 819, in 
_callmethod
 kind, result = conn.recv()
 File: "/usr/local/lib/python3.7/multiprocessing/connection.py", line 250, in 
recv
 buf = self._recv_bytes()
 File: "/usr/local/lib/python3.7/multiprocessing/connection.py", line 407, in 
_recv_bytes
 buf = self._recv(4)
 File: "/usr/local/lib/python3.7/multiprocessing/connection.py", line 379, in 
_recv
 chunk = read(handle, remaining){code}


was (Author: gaurav123):
Hi, At GoJek, we are facing the same issue with the local executor. Here's the 
thread dump. 

```

ThreadID: 140356901611264
 File: "/usr/local/lib/python3.7/threading.py", line 890, in _bootstrap
 self._bootstrap_inner()
 File: "/usr/local/lib/python3.7/threading.py", line 926, in _bootstrap_inner
 self.run()
 File: "<string>", line 167, in run
 File: "/usr/local/lib/python3.7/code.py", line 232, in interact
 more = self.push(line)
 File: "/usr/local/lib/python3.7/code.py", line 258, in push
 more = self.runsource(source, self.filename)
 File: "/usr/local/lib/python3.7/code.py", line 74, in runsource
 self.runcode(code)
 File: "/usr/local/lib/python3.7/code.py", line 90, in runcode
 exec(code, self.locals)
 File: "<console>", line 3, in <module>

ThreadID: 140358376056576
 File: "/usr/local/bin/airflow", line 37, in <module>
 args.func(args)
 File: "/usr/local/lib/python3.7/site-packages/airflow/utils/cli.py", line 74, 
in wrapper
 return f(*args, **kwargs)
 File: "/usr/local/lib/python3.7/site-packages/airflow/bin/cli.py", line 1042, 
in scheduler
 job.run()
 File: "/usr/local/lib/python3.7/site-packages/airflow/jobs/base_job.py", line 
222, in run
 self._execute()
 File: "/usr/local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", 
line 1356, in _execute
 self._execute_helper()
 File: "/usr/local/lib/python3.7/site-packages/airflow/jobs/scheduler_job.py", 
line 1496, in _execute_helper
 self.executor.end()
 File: 
"/usr/local/lib/python3.7/site-packages/airflow/executors/local_executor.py", 
line 233, in end
 self.impl.end()
 File: 
"/usr/local/lib/python3.7/site-packages/airflow/executors/local_executor.py", 
line 212, in end
 self.queue.join()
 File: "<string>", line 2, in join
 File: "/usr/local/lib/python3.7/multiprocessing/managers.py", line 819, in 
_callmethod
 kind, result = conn.recv()
 File: "/usr/local/lib/python3.7/multiprocessing/connection.py", line 250, in 
recv
 buf = self._recv_bytes()
 File: "/usr/local/lib/python3.7/multiprocessing/connection.py", line 407, in 
_recv_bytes
 buf = self._recv(4)
 File: "/usr/local/lib/python3.7/multiprocessing/connection.py", line 379, in 
_recv
 chunk = read(handle, remaining)
 ```

> Scheduler does not terminate after num_runs when executor is 
> KubernetesExecutor
> -------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-4424
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-4424
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: executors, scheduler
>    Affects Versions: 1.10.3
>         Environment: EKS, deployed with stable airflow helm chart
>            Reporter: Brian Nutt
>            Priority: Blocker
>              Labels: kubernetes
>             Fix For: 2.0.0
>
>
> When using the executor like the CeleryExecutor and num_runs is set on the 
> scheduler, the scheduler pod restarts after num runs have completed. After 
> switching to KubernetesExecutor, the scheduler logs:
> [2019-04-26 19:20:43,562] \{{kubernetes_executor.py:770}} INFO - Shutting 
> down Kubernetes executor
> However, the scheduler process does not complete. This leads to the scheduler 
> pod never restarting and running num_runs again. Resulted in having to roll 
> back to CeleryExecutor because if num_runs is -1, the scheduler builds up 
> tons of defunct processes, which is eventually making tasks not able to be 
> scheduled as the underlying nodes have run out of file descriptors.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to