bensonnd commented on issue #13542:
URL: https://github.com/apache/airflow/issues/13542#issuecomment-823322869


   Following on what @pelaprat mentioned, we are not running with either the 
CeleryExecutor or KubernetesExecutor, but the LocalExecutor in a Docker 
container. We get tasks stuck in scheduled or queued and the dag is marked as 
running, but is not. It seems like the scheduler falls asleep or misses queued 
tasks.
   
   Either clearing the queued tasks or restarting the scheduler with `airflow 
scheduler` inside the container gets it moving again. 
   
   We've observed two different sets of logs over and over again when it does 
get in this stuck state. One detecting zombie jobs, and the other just checking 
for the regular heartbeat.
   
   ```
        61 File Path                                       PID  Runtime      # 
DAGs    # Errors  Last Runtime    Last Run
        62 -------------------------------------------  ------  ---------  
--------  ----------  --------------  -------------------
        63 /opt/ingest/batch_ingest/dags/ingest_dag.py  120318  4.02s           
  1           0  5.43s           2021-04-08T16:37:43
        64 
================================================================================
        65 [2021-04-08 16:37:58,444] {dag_processing.py:1071} INFO - Finding 
'running' jobs without a recent heartbeat
        66 [2021-04-08 16:37:58,445] {dag_processing.py:1075} INFO - Failing 
jobs without heartbeat after 2021-04-08 16:32:58.445055+00:00
        67 [2021-04-08 16:37:58,455] {dag_processing.py:1098} INFO - Detected 
zombie job: {'full_filepath': '/opt/ingest/batch_ingest/dags/ingest_dag.py', 
'msg': 'Detected as zombie', 'simple_task_instance': 
<airflow.models.taskinstance.Si>
        68 [2021-04-08 16:38:08,595] {dag_processing.py:1071} INFO - Finding 
'running' jobs without a recent heartbeat
        69 [2021-04-08 16:38:08,596] {dag_processing.py:1075} INFO - Failing 
jobs without heartbeat after 2021-04-08 16:33:08.596291+00:00
        70 [2021-04-08 16:38:08,607] {dag_processing.py:1098} INFO - Detected 
zombie job: {'full_filepath': '/opt/ingest/batch_ingest/dags/ingest_dag.py', 
'msg': 'Detected as zombie', 'simple_task_instance': 
<airflow.models.taskinstance.Si>
        71 [2021-04-08 16:38:18,650] {dag_processing.py:1071} INFO - Finding 
'running' jobs without a recent heartbeat
        72 [2021-04-08 16:38:18,651] {dag_processing.py:1075} INFO - Failing 
jobs without heartbeat after 2021-04-08 16:33:18.651308+00:00
        73 [2021-04-08 16:38:18,661] {dag_processing.py:1098} INFO - Detected 
zombie job: {'full_filepath': '/opt/ingest/batch_ingest/dags/ingest_dag.py', 
'msg': 'Detected as zombie', 'simple_task_instance': 
<airflow.models.taskinstance.Si>
        74 [2021-04-08 16:38:22,690] {dag_processing.py:838} INFO - 
        75 
================================================================================
        76 DAG File Processing Stats
   ```
   
   or
   
   ```
   File Path                                    PID    Runtime      # DAGs    # 
Errors  Last Runtime    Last Run
   -------------------------------------------  -----  ---------  --------  
----------  --------------  -------------------
   /opt/ingest/batch_ingest/dags/ingest_dag.py                           1      
     0  1.52s           2021-04-08T18:29:22
   
================================================================================
   [2021-04-08 18:29:33,015] {dag_processing.py:1071} INFO - Finding 'running' 
jobs without a recent heartbeat
   [2021-04-08 18:29:33,016] {dag_processing.py:1075} INFO - Failing jobs 
without heartbeat after 2021-04-08 18:24:33.016077+00:00
   [2021-04-08 18:29:43,036] {dag_processing.py:1071} INFO - Finding 'running' 
jobs without a recent heartbeat
   [2021-04-08 18:29:43,037] {dag_processing.py:1075} INFO - Failing jobs 
without heartbeat after 2021-04-08 18:24:43.037136+00:00
   [2021-04-08 18:29:53,072] {dag_processing.py:1071} INFO - Finding 'running' 
jobs without a recent heartbeat
   [2021-04-08 18:29:53,072] {dag_processing.py:1075} INFO - Failing jobs 
without heartbeat after 2021-04-08 18:24:53.072257+00:00
   [2021-04-08 18:29:53,080] {dag_processing.py:838} INFO - 
   
================================================================================
   DAG File Processing Stats
   ```
   
   We are in the process of pushing 2.0.2 as @kaxil noted to see if that is the 
issue. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to