shivanshs9 edited a comment on issue #13542:
URL: https://github.com/apache/airflow/issues/13542#issuecomment-861406671


   > `kill -USR2 <pid of scheduler>` -- how you get the pid depends upon how 
and where you are running it :)
   > 
   > Likely exec in to the container, run `ps auxww` and find the oldest 
scheduler processs (you'll see some sub processes, possibly named helpfully).
   
   @ashb @Jorricks  I'm facing a similar problem as OP on Airflow v2.0.1 with 
Celery Executor. If it helps, you can find the debug information from scheduler 
logs:
   
   ```
   [2021-06-15 07:13:48,623] {{dag_processing.py:1071}} INFO - Finding 
'running' jobs without a recent heartbeat
   [2021-06-15 07:13:48,624] {{dag_processing.py:1075}} INFO - Failing jobs 
without heartbeat after 2021-06-15 07:08:48.624534+00:00
   [2021-06-15 07:13:49,539] {{scheduler_job.py:757}} INFO - 
--------------------------------------------------------------------------------
   SIGUSR2 received, printing debug
   
--------------------------------------------------------------------------------
   [2021-06-15 07:13:49,539] {{base_executor.py:302}} INFO - executor.queued (0)
        
   [2021-06-15 07:13:49,539] {{base_executor.py:307}} INFO - executor.running 
(40)
        TaskInstanceKey(dag_id='ergo_job_collector', 
task_id='process_job_result', execution_date=datetime.datetime(2021, 6, 15, 4, 
39, 10, 753973, tzinfo=Timezone('UTC')), try_number=1)
        TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_queued', 
execution_date=datetime.datetime(2021, 6, 13, 22, 40, tzinfo=Timezone('UTC')), 
try_number=1)
        TaskInstanceKey(dag_id='ergo_task_queuer', task_id='push_tasks', 
execution_date=datetime.datetime(2021, 6, 15, 4, 39, 1, 316420, 
tzinfo=Timezone('UTC')), try_number=1)
        TaskInstanceKey(dag_id='smart_sensor_group_shard_4', 
task_id='smart_sensor_task', execution_date=datetime.datetime(2021, 6, 14, 10, 
54, 17, 679502, tzinfo=Timezone('UTC')), try_number=32)
        TaskInstanceKey(dag_id='smart_sensor_group_shard_1', 
task_id='smart_sensor_task', execution_date=datetime.datetime(2021, 6, 14, 7, 
3, 23, 693482, tzinfo=Timezone('UTC')), try_number=41)
        TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor', 
execution_date=datetime.datetime(2021, 6, 13, 9, 10, tzinfo=Timezone('UTC')), 
try_number=1)
        TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor', 
execution_date=datetime.datetime(2021, 6, 13, 9, 10, tzinfo=Timezone('UTC')), 
try_number=1)
        TaskInstanceKey(dag_id='smart_sensor_group_shard_3', 
task_id='smart_sensor_task', execution_date=datetime.datetime(2021, 6, 14, 6, 
58, 43, 797668, tzinfo=Timezone('UTC')), try_number=39)
        TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_queued', 
execution_date=datetime.datetime(2021, 6, 11, 6, 10, tzinfo=Timezone('UTC')), 
try_number=1)
        TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor', 
execution_date=datetime.datetime(2021, 6, 15, 4, 0, tzinfo=Timezone('UTC')), 
try_number=1)
        TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor', 
execution_date=datetime.datetime(2021, 6, 14, 22, 0, tzinfo=Timezone('UTC')), 
try_number=1)
        TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor', 
execution_date=datetime.datetime(2021, 6, 11, 6, 30, tzinfo=Timezone('UTC')), 
try_number=1)
        TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_pusher', 
execution_date=datetime.datetime(2021, 6, 15, 4, 0, tzinfo=Timezone('UTC')), 
try_number=1)
        TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_queued', 
execution_date=datetime.datetime(2021, 6, 13, 22, 20, tzinfo=Timezone('UTC')), 
try_number=1)
        TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor', 
execution_date=datetime.datetime(2021, 6, 14, 22, 20, tzinfo=Timezone('UTC')), 
try_number=1)
        TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_queued', 
execution_date=datetime.datetime(2021, 6, 12, 21, 10, tzinfo=Timezone('UTC')), 
try_number=1)
        TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor', 
execution_date=datetime.datetime(2021, 6, 13, 9, 16, tzinfo=Timezone('UTC')), 
try_number=1)
        TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_queued', 
execution_date=datetime.datetime(2021, 6, 11, 6, 20, tzinfo=Timezone('UTC')), 
try_number=1)
        TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_pusher', 
execution_date=datetime.datetime(2021, 6, 15, 4, 0, tzinfo=Timezone('UTC')), 
try_number=1)
        TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor', 
execution_date=datetime.datetime(2021, 6, 15, 4, 0, tzinfo=Timezone('UTC')), 
try_number=1)
        TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor', 
execution_date=datetime.datetime(2021, 6, 15, 4, 20, tzinfo=Timezone('UTC')), 
try_number=1)
        TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_queued', 
execution_date=datetime.datetime(2021, 6, 12, 21, 20, tzinfo=Timezone('UTC')), 
try_number=1)
        TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor', 
execution_date=datetime.datetime(2021, 6, 15, 4, 5, tzinfo=Timezone('UTC')), 
try_number=1)
        TaskInstanceKey(dag_id='smart_sensor_group_shard_0', 
task_id='smart_sensor_task', execution_date=datetime.datetime(2021, 6, 14, 7, 
0, 2, 374657, tzinfo=Timezone('UTC')), try_number=42)
        TaskInstanceKey(dag_id='smart_sensor_group_shard_2', 
task_id='smart_sensor_task', execution_date=datetime.datetime(2021, 6, 14, 6, 
58, 43, 797327, tzinfo=Timezone('UTC')), try_number=40)
        TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor', 
execution_date=datetime.datetime(2021, 6, 15, 4, 32, tzinfo=Timezone('UTC')), 
try_number=1)
        TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_queued', 
execution_date=datetime.datetime(2021, 6, 12, 21, 0, tzinfo=Timezone('UTC')), 
try_number=1)
        TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor', 
execution_date=datetime.datetime(2021, 6, 14, 22, 15, tzinfo=Timezone('UTC')), 
try_number=1)
        TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor', 
execution_date=datetime.datetime(2021, 6, 15, 4, 10, tzinfo=Timezone('UTC')), 
try_number=1)
        TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_pusher', 
execution_date=datetime.datetime(2021, 6, 15, 4, 0, tzinfo=Timezone('UTC')), 
try_number=1)
        TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor', 
execution_date=datetime.datetime(2021, 6, 14, 22, 20, tzinfo=Timezone('UTC')), 
try_number=1)
        TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_queued', 
execution_date=datetime.datetime(2021, 6, 15, 3, 0, tzinfo=Timezone('UTC')), 
try_number=1)
        TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_queued', 
execution_date=datetime.datetime(2021, 6, 15, 0, 0, tzinfo=Timezone('UTC')), 
try_number=1)
        TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor', 
execution_date=datetime.datetime(2021, 6, 15, 4, 20, tzinfo=Timezone('UTC')), 
try_number=1)
        TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_queued', 
execution_date=datetime.datetime(2021, 6, 12, 21, 0, tzinfo=Timezone('UTC')), 
try_number=1)
        TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor', 
execution_date=datetime.datetime(2021, 6, 14, 22, 30, tzinfo=Timezone('UTC')), 
try_number=1)
        TaskInstanceKey(dag_id='smart_sensor_group_shard_3', 
task_id='smart_sensor_task', execution_date=datetime.datetime(2021, 6, 14, 6, 
58, 43, 797668, tzinfo=Timezone('UTC')), try_number=41)
        TaskInstanceKey(dag_id='smart_sensor_group_shard_4', 
task_id='smart_sensor_task', execution_date=datetime.datetime(2021, 6, 14, 10, 
54, 17, 679502, tzinfo=Timezone('UTC')), try_number=30)
        TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_queued', 
execution_date=datetime.datetime(2021, 6, 11, 6, 0, tzinfo=Timezone('UTC')), 
try_number=1)
        TaskInstanceKey(dag_id='smart_sensor_group_shard_0', 
task_id='smart_sensor_task', execution_date=datetime.datetime(2021, 6, 14, 7, 
0, 2, 374657, tzinfo=Timezone('UTC')), try_number=39)
   [2021-06-15 07:13:49,540] {{base_executor.py:308}} INFO - 
executor.event_buffer (0)
        
   [2021-06-15 07:13:49,540] {{celery_executor.py:387}} INFO - executor.tasks 
(27)
        (TaskInstanceKey(dag_id='smart_sensor_group_shard_4', 
task_id='smart_sensor_task', execution_date=datetime.datetime(2021, 6, 14, 10, 
54, 17, 679502, tzinfo=Timezone('UTC')), try_number=30), <AsyncResult: 
6960c6d0-7e21-4ef9-8f04-d95efdd9d706>)
        (TaskInstanceKey(dag_id='smart_sensor_group_shard_2', 
task_id='smart_sensor_task', execution_date=datetime.datetime(2021, 6, 14, 6, 
58, 43, 797327, tzinfo=Timezone('UTC')), try_number=40), <AsyncResult: 
4a815d59-d824-4373-8f96-34272174cfc0>)
        (TaskInstanceKey(dag_id='smart_sensor_group_shard_3', 
task_id='smart_sensor_task', execution_date=datetime.datetime(2021, 6, 14, 6, 
58, 43, 797668, tzinfo=Timezone('UTC')), try_number=39), <AsyncResult: 
5296e550-5efd-474a-9e34-897330504886>)
        (TaskInstanceKey(dag_id='smart_sensor_group_shard_0', 
task_id='smart_sensor_task', execution_date=datetime.datetime(2021, 6, 14, 7, 
0, 2, 374657, tzinfo=Timezone('UTC')), try_number=39), <AsyncResult: 
cad2a01b-cde3-473e-85bc-9f96eae7da7c>)
        (TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_pusher', 
execution_date=datetime.datetime(2021, 6, 15, 4, 0, tzinfo=Timezone('UTC')), 
try_number=1), <AsyncResult: cae54db9-56a5-4a27-8226-1dc5867265f7>)
        (TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor', 
execution_date=datetime.datetime(2021, 6, 14, 22, 0, tzinfo=Timezone('UTC')), 
try_number=1), <AsyncResult: 438e5379-ef8d-4bfe-8e55-cae461a3f62f>)
        (TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor', 
execution_date=datetime.datetime(2021, 6, 15, 4, 0, tzinfo=Timezone('UTC')), 
try_number=1), <AsyncResult: 9a953b60-b222-4d4c-aff4-5b831caef71f>)
        (TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_pusher', 
execution_date=datetime.datetime(2021, 6, 15, 4, 0, tzinfo=Timezone('UTC')), 
try_number=1), <AsyncResult: 840e404c-1cee-4c18-8123-b354a63e8d80>)
        (TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor', 
execution_date=datetime.datetime(2021, 6, 14, 22, 20, tzinfo=Timezone('UTC')), 
try_number=1), <AsyncResult: 4410416b-7132-4713-a78d-caa0ce3cacb0>)
        (TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor', 
execution_date=datetime.datetime(2021, 6, 15, 4, 0, tzinfo=Timezone('UTC')), 
try_number=1), <AsyncResult: 91d5060a-ff5c-4997-832a-757c25069ac7>)
        (TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor', 
execution_date=datetime.datetime(2021, 6, 15, 4, 10, tzinfo=Timezone('UTC')), 
try_number=1), <AsyncResult: ade2d701-f632-41d9-a901-40531619dfba>)
        (TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor', 
execution_date=datetime.datetime(2021, 6, 15, 4, 20, tzinfo=Timezone('UTC')), 
try_number=1), <AsyncResult: febb3b1e-0f63-41d3-83dc-58f5b11a39ca>)
        (TaskInstanceKey(dag_id='ergo_job_collector', 
task_id='process_job_result', execution_date=datetime.datetime(2021, 6, 15, 4, 
39, 10, 753973, tzinfo=Timezone('UTC')), try_number=1), <AsyncResult: 
390f1451-7f44-4415-8c57-e2542ed6b6d2>)
        (TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor', 
execution_date=datetime.datetime(2021, 6, 14, 22, 30, tzinfo=Timezone('UTC')), 
try_number=1), <AsyncResult: 7926f732-e671-4f64-afaf-7c352a85a929>)
        (TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor', 
execution_date=datetime.datetime(2021, 6, 15, 4, 32, tzinfo=Timezone('UTC')), 
try_number=1), <AsyncResult: ba1edc8e-66b4-4a20-9cc6-5b0c5758d8c4>)
        (TaskInstanceKey(dag_id='ergo_task_queuer', task_id='push_tasks', 
execution_date=datetime.datetime(2021, 6, 15, 4, 39, 1, 316420, 
tzinfo=Timezone('UTC')), try_number=1), <AsyncResult: 
0ef80d4f-1c4c-4e62-b5ed-cebd0b3ba075>)
        (TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor', 
execution_date=datetime.datetime(2021, 6, 15, 4, 20, tzinfo=Timezone('UTC')), 
try_number=1), <AsyncResult: fea3edfa-477d-4fb0-a580-03ea91faaef1>)
        (TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_pusher', 
execution_date=datetime.datetime(2021, 6, 15, 4, 0, tzinfo=Timezone('UTC')), 
try_number=1), <AsyncResult: 4749177d-df70-49cf-a2fb-9b0a230d5ff4>)
        (TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_queued', 
execution_date=datetime.datetime(2021, 6, 15, 0, 0, tzinfo=Timezone('UTC')), 
try_number=1), <AsyncResult: bf04f105-efc7-41be-864c-8161c4973f6b>)
        (TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_queued', 
execution_date=datetime.datetime(2021, 6, 15, 3, 0, tzinfo=Timezone('UTC')), 
try_number=1), <AsyncResult: 7c8c6550-cdcc-4405-816b-15da1dbeb79e>)
        (TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor', 
execution_date=datetime.datetime(2021, 6, 14, 22, 15, tzinfo=Timezone('UTC')), 
try_number=1), <AsyncResult: ee255fe6-21d2-4d3f-8860-83e55265fea9>)
        (TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor', 
execution_date=datetime.datetime(2021, 6, 14, 22, 20, tzinfo=Timezone('UTC')), 
try_number=1), <AsyncResult: 46b77d60-43b5-41e6-99eb-e407e0d74791>)
        (TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor', 
execution_date=datetime.datetime(2021, 6, 15, 4, 5, tzinfo=Timezone('UTC')), 
try_number=1), <AsyncResult: ed5115f9-dec6-49da-a5bf-15860a907ab4>)
        (TaskInstanceKey(dag_id='smart_sensor_group_shard_4', 
task_id='smart_sensor_task', execution_date=datetime.datetime(2021, 6, 14, 10, 
54, 17, 679502, tzinfo=Timezone('UTC')), try_number=32), <AsyncResult: 
49ee2d3f-eef1-4f7a-a5da-923e8cb8b914>)
        (TaskInstanceKey(dag_id='smart_sensor_group_shard_1', 
task_id='smart_sensor_task', execution_date=datetime.datetime(2021, 6, 14, 7, 
3, 23, 693482, tzinfo=Timezone('UTC')), try_number=41), <AsyncResult: 
910a1d67-658d-4fe9-954a-2d093b64c95d>)
        (TaskInstanceKey(dag_id='smart_sensor_group_shard_0', 
task_id='smart_sensor_task', execution_date=datetime.datetime(2021, 6, 14, 7, 
0, 2, 374657, tzinfo=Timezone('UTC')), try_number=42), <AsyncResult: 
8061f49f-8050-4b73-84fd-4a0222369040>)
        (TaskInstanceKey(dag_id='smart_sensor_group_shard_3', 
task_id='smart_sensor_task', execution_date=datetime.datetime(2021, 6, 14, 6, 
58, 43, 797668, tzinfo=Timezone('UTC')), try_number=41), <AsyncResult: 
262cf0ac-1132-4176-aba0-77ec4a2bddb7>)
   [2021-06-15 07:13:49,540] {{celery_executor.py:390}} INFO - 
executor.adopted_task_timeouts (0)
        
   [2021-06-15 07:13:49,540] {{scheduler_job.py:760}} INFO - 
--------------------------------------------------------------------------------
   [2021-06-15 07:13:51,028] {{dag_processing.py:838}} INFO -
   
================================================================================
   DAG File Processing Stats
   
   File Path                                                                    
                       PID  Runtime      # DAGs    # Errors  Last Runtime    
Last Run
   
------------------------------------------------------------------------------------------------
  -----  ---------  --------  ----------  --------------  -------------------
   
/home/airflow/.local/lib/python3.8/site-packages/airflow/smart_sensor_dags/__init__.py
                                     0           0  0.09s           
2021-06-15T07:13:50
   /opt/airflow/dags/dag_ergo.py                                                
                                              2           0  0.47s           
2021-06-15T07:13:51
   
/home/airflow/.local/lib/python3.8/site-packages/airflow/smart_sensor_dags/smart_sensor_group.py
  16087  0.00s             5           0  0.15s           2021-06-15T07:13:50
   /opt/airflow/dags/dag_dagen.py                                               
                     16081  1.48s            90           0  4.10s           
2021-06-15T07:13:49
   
================================================================================
   ```
   
   Workers have already finished executing these tasks but the scheduler is 
stuck and doesn't schedule any new tasks. The only fix that works is by 
manually terminating and restarting the scheduler pod.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to