Narendra-Neerukonda opened a new issue #14166:
URL: https://github.com/apache/airflow/issues/14166
<details>
<summary>output of ps -elf</summary>
F S UID PID PPID C PRI NI ADDR SZ WCHAN STIME TTY TIME CMD
4 S airflow 1 0 0 80 0 - 2921 do_wai 05:08 ? 00:00:00 sh
/airflow/start-airflow.sh
4 S airflow 8 1 1 80 0 - 122377 poll_s 05:08 ? 00:02:08
/opt/conda/envs/airflow_app/bin/python /opt/conda/envs/airflow_app/bin/airflow
scheduler
5 S airflow 19 8 2 80 0 - 288560 futex_ 05:08 ? 00:03:00
/opt/conda/envs/airflow_app/bin/python /opt/conda/envs/airflow_app/bin/airflow
scheduler
5 S airflow 27 8 1 80 0 - 308108 futex_ 05:08 ? 00:01:34
/opt/conda/envs/airflow_app/bin/python /opt/conda/envs/airflow_app/bin/airflow
scheduler
5 S airflow 34 8 0 80 0 - 121993 sk_wai 05:08 ? 00:00:13
/opt/conda/envs/airflow_app/bin/python /opt/conda/envs/airflow_app/bin/airflow
scheduler
5 S airflow 38 8 4 80 0 - 123007 poll_s 05:08 ? 00:05:23 airflow scheduler
-- DagFileProcessorManager
4 S airflow 47723 0 0 80 0 - 3476 do_wai 07:12 pts/0 00:00:00 bash
4 R airflow 48145 47723 0 80 0 - 13456 - 07:12 pts/0 00:00:00 ps -elf
5 Z airflow 48457 38 0 80 0 - 0 do_exi 07:12 ? 00:00:00 [airflow schedul]
<defunct>
5 Z airflow 48459 38 0 80 0 - 0 do_exi 07:12 ? 00:00:00 [airflow schedul]
<defunct>
5 Z airflow 48464 38 0 80 0 - 0 do_exi 07:12 ? 00:00:00 [airflow schedul]
<defunct>
5 Z airflow 48467 38 0 80 0 - 0 do_exi 07:12 ? 00:00:00 [airflow schedul]
<defunct>
5 S airflow 48470 38 0 80 0 - 124282 sk_wai 07:12 ? 00:00:00 airflow
scheduler - DagFileProcessor
/airflow/dags/dev_demo_dag_airflow-svccs-20210121164806.zip
5 Z airflow 48475 38 0 80 0 - 0 do_exi 07:12 ? 00:00:00 [airflow schedul]
<defunct>
5 R airflow 48477 38 0 80 0 - 126660 - 07:12 ? 00:00:00 airflow scheduler -
DagFileProcessor /airflow/dags/BashPipeline-v2-0-0-opr-svccs-20200528162736.zip
5 Z airflow 48479 38 0 80 0 - 0 do_exi 07:12 ? 00:00:00 [airflow schedul]
<defunct>
5 Z airflow 48481 38 0 80 0 - 0 do_exi 07:12 ? 00:00:00 [airflow schedul]
<defunct>
5 Z airflow 48484 38 0 80 0 - 0 do_exi 07:12 ? 00:00:00 [airflow schedul]
<defunct>
</details>
**Apache Airflow version**: 1.10.10
**Kubernetes version (if you are using kubernetes)** (use `kubectl
version`): 1.19.5
**Environment**: Airflow Docker image (python:3.6-slim-buster)
- **OS** (e.g. from /etc/os-release): RHEL 7.5
- **Kernel** (e.g. `uname -a`): 3.10.0-1160.11.1.el7.x86_64
**What happened**: DagFileProcessorManager is creating defunct processes
after running for some duration (Usually hours)
**What you expected to happen**: The defunct processes should be cleared up
by the DagFilePocessorManager
I have the `num_runs` parameter set to -1 which causes the
DagFileProcessorManager to keep spawning processes to try and schedule dags
indefinetly (as expected). However, as this process keep on running, we
eventually end up with some defunct processes. Ideally these defuncts would
have been cleared if the DagFileProcessorManager had exited but due to the
num_runs=-1 setting it never dies and the defunct processes are never reaped.
In the provided output of `ps -elf`, we can see the process 38 has created
some child processes which went defunct and would now only exit if the parent
was killed or called on the exit status of those defuncts.
**How to reproduce it**:By Running airflow docker image with num_runs not
set in the config or set to -1. Have a few dags in the dags folder for the
airflow to keep loading.
**Anything else we need to know**:
Since processes go defunct waiting for the parent to call upon the exitcode,
I'm assuming , adding a call for exit code and a terminate call on finished
processors at dag_processing.py - DagFileProcessorManager:collect_results will
get the status of the process and kill the processes launched by
DagFileProcessorManager once they have finished executing. All the defunct
processes have actually stopped as i can see that in the logs when `if
processor.done` is getting called at `collect_result`. Ex: one of the defunct
processes is 48464 and i see this statement in log "DEBUG - Waiting for
<Process(DagFileProcessor48464-Process, stopped)>"
```python
for file_path, processor in finished_processors.items():
if processor.result is None:
self.log.error(
"Processor for %s exited with return code %s.",
processor.file_path, processor.exit_code
)
else:
for simple_dag in processor.result[0]:
simple_dags.append(simple_dag)
self.log.info("Processor for %s exited with return code %s.",
processor.file_path, processor.exit_code) -> this line
processor.terminate() -> and this line
```
How often does this problem occur? Once? Every time etc? Every time after a
few hours of starting the airflow scheduler.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]