[
https://issues.apache.org/jira/browse/AIRFLOW-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16922191#comment-16922191
]
Johannes Kaufmann commented on AIRFLOW-4499:
--------------------------------------------
We are also facing this issue. Its not the same as AIRFLOW-401 as well, as CPU
load is 0. So it looks more like the scheduler process is in a zombie like mode.
We set -r 1800, but it periodically does not restart.
Looking through the syslogs yields the following pattern (I removed the overly
verbose beginning of every logline and also added some comments regarding its
structure):
{{[2019-08-29 03:50:55,196] \{jobs.py:406} INFO - Processing
/path/to/airflow/dags/all_purpose_dag.py took 53.097 seconds
[2019-08-29 03:50:55,200] \{settings.py:206} DEBUG - Disposing DB connection
pool (PID 19287)
# Start of repeated pattern until we restart the server.
[2019-08-29 03:50:55,375] \{jobs.py:1573} DEBUG - Starting Loop...
[2019-08-29 03:50:55,376] \{jobs.py:1584} DEBUG - Harvesting DAG parsing results
[2019-08-29 03:50:55,376] \{jobs.py:1586} DEBUG - Harvested 0 SimpleDAGs
[2019-08-29 03:50:55,376] \{jobs.py:1621} DEBUG - Heartbeating the executor
[2019-08-29 03:50:55,376] \{base_executor.py:124} DEBUG - 0 running task
instances
[2019-08-29 03:50:55,376] \{base_executor.py:125} DEBUG - 0 in queue
[2019-08-29 03:50:55,376] \{base_executor.py:126} DEBUG - 120 open slots
[2019-08-29 03:50:55,376] \{base_executor.py:146} DEBUG - Calling the <class
'airflow.executors.local_executor.LocalExecutor'> sync method
[2019-08-29 03:50:55,377] \{jobs.py:1642} DEBUG - Ran scheduling loop in 0.00
seconds
[2019-08-29 03:50:55,377] \{jobs.py:1645} DEBUG - Sleeping for 1.00 seconds
[2019-08-29 03:50:56,378] \{jobs.py:1663} DEBUG - Sleeping for 1.00 seconds to
prevent excessive logging
# End of repeated pattern until we restart the server.
# Here is the (slightly different) pattern again.
[2019-08-29 03:50:57,379] \{jobs.py:1573} DEBUG - Starting Loop...
[2019-08-29 03:50:57,379] \{jobs.py:1584} DEBUG - Harvesting DAG parsing results
[2019-08-29 03:50:57,380] \{jobs.py:1586} DEBUG - Harvested 0 SimpleDAGs
[2019-08-29 03:50:57,380] \{jobs.py:1621} DEBUG - Heartbeating the executor
[2019-08-29 03:50:57,380] \{base_executor.py:124} DEBUG - 0 running task
instances
[2019-08-29 03:50:57,380] \{base_executor.py:125} DEBUG - 0 in queue
[2019-08-29 03:50:57,380] \{base_executor.py:126} DEBUG - 120 open slots
[2019-08-29 03:50:57,380] \{base_executor.py:146} DEBUG - Calling the <class
'airflow.executors.local_executor.LocalExecutor'> sync method
[2019-08-29 03:50:57,380] \{jobs.py:1633} DEBUG - Heartbeating the scheduler
[2019-08-29 03:50:57,391] \{jobs.py:193} DEBUG - [heartbeat]}}
> scheduler process running (in ps) but not doing anything, not writing to log
> for 3+hrs and not processing tasks
> ---------------------------------------------------------------------------------------------------------------
>
> Key: AIRFLOW-4499
> URL: https://issues.apache.org/jira/browse/AIRFLOW-4499
> Project: Apache Airflow
> Issue Type: Bug
> Components: scheduler
> Affects Versions: 1.10.3
> Reporter: t oo
> Priority: Critical
>
> blogs mention this as long-standing issue but i could not see open JIRA for
> it.
> scheduler process running (in ps -ef) but not doing anything, not writing to
> log for 3+hrs and not processing tasks
> band-aid solution here:
> new config value ---> scheduler_restart_mins = x
> implement auto-restart of scheduler process if scheduler log file not updated
> within 2*x mins and scheduler process start time is older than x mins
> env: localexecutor
--
This message was sent by Atlassian Jira
(v8.3.2#803003)