shivanshs9 edited a comment on issue #13542:
URL: https://github.com/apache/airflow/issues/13542#issuecomment-861406671
> `kill -USR2 <pid of scheduler>` -- how you get the pid depends upon how
and where you are running it :)
>
> Likely exec in to the container, run `ps auxww` and find the oldest
scheduler processs (you'll see some sub processes, possibly named helpfully).
@ashb @Jorricks I'm facing a similar problem as OP on Airflow v2.0.1 with
Celery Executor. If it helps, you can find the debug information from scheduler
logs:
```
[2021-06-15 07:13:48,623] {{dag_processing.py:1071}} INFO - Finding
'running' jobs without a recent heartbeat
[2021-06-15 07:13:48,624] {{dag_processing.py:1075}} INFO - Failing jobs
without heartbeat after 2021-06-15 07:08:48.624534+00:00
[2021-06-15 07:13:49,539] {{scheduler_job.py:757}} INFO -
--------------------------------------------------------------------------------
SIGUSR2 received, printing debug
--------------------------------------------------------------------------------
[2021-06-15 07:13:49,539] {{base_executor.py:302}} INFO - executor.queued (0)
[2021-06-15 07:13:49,539] {{base_executor.py:307}} INFO - executor.running
(40)
TaskInstanceKey(dag_id='ergo_job_collector',
task_id='process_job_result', execution_date=datetime.datetime(2021, 6, 15, 4,
39, 10, 753973, tzinfo=Timezone('UTC')), try_number=1)
TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_queued',
execution_date=datetime.datetime(2021, 6, 13, 22, 40, tzinfo=Timezone('UTC')),
try_number=1)
TaskInstanceKey(dag_id='ergo_task_queuer', task_id='push_tasks',
execution_date=datetime.datetime(2021, 6, 15, 4, 39, 1, 316420,
tzinfo=Timezone('UTC')), try_number=1)
TaskInstanceKey(dag_id='smart_sensor_group_shard_4',
task_id='smart_sensor_task', execution_date=datetime.datetime(2021, 6, 14, 10,
54, 17, 679502, tzinfo=Timezone('UTC')), try_number=32)
TaskInstanceKey(dag_id='smart_sensor_group_shard_1',
task_id='smart_sensor_task', execution_date=datetime.datetime(2021, 6, 14, 7,
3, 23, 693482, tzinfo=Timezone('UTC')), try_number=41)
TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor',
execution_date=datetime.datetime(2021, 6, 13, 9, 10, tzinfo=Timezone('UTC')),
try_number=1)
TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor',
execution_date=datetime.datetime(2021, 6, 13, 9, 10, tzinfo=Timezone('UTC')),
try_number=1)
TaskInstanceKey(dag_id='smart_sensor_group_shard_3',
task_id='smart_sensor_task', execution_date=datetime.datetime(2021, 6, 14, 6,
58, 43, 797668, tzinfo=Timezone('UTC')), try_number=39)
TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_queued',
execution_date=datetime.datetime(2021, 6, 11, 6, 10, tzinfo=Timezone('UTC')),
try_number=1)
TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor',
execution_date=datetime.datetime(2021, 6, 15, 4, 0, tzinfo=Timezone('UTC')),
try_number=1)
TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor',
execution_date=datetime.datetime(2021, 6, 14, 22, 0, tzinfo=Timezone('UTC')),
try_number=1)
TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor',
execution_date=datetime.datetime(2021, 6, 11, 6, 30, tzinfo=Timezone('UTC')),
try_number=1)
TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_pusher',
execution_date=datetime.datetime(2021, 6, 15, 4, 0, tzinfo=Timezone('UTC')),
try_number=1)
TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_queued',
execution_date=datetime.datetime(2021, 6, 13, 22, 20, tzinfo=Timezone('UTC')),
try_number=1)
TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor',
execution_date=datetime.datetime(2021, 6, 14, 22, 20, tzinfo=Timezone('UTC')),
try_number=1)
TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_queued',
execution_date=datetime.datetime(2021, 6, 12, 21, 10, tzinfo=Timezone('UTC')),
try_number=1)
TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor',
execution_date=datetime.datetime(2021, 6, 13, 9, 16, tzinfo=Timezone('UTC')),
try_number=1)
TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_queued',
execution_date=datetime.datetime(2021, 6, 11, 6, 20, tzinfo=Timezone('UTC')),
try_number=1)
TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_pusher',
execution_date=datetime.datetime(2021, 6, 15, 4, 0, tzinfo=Timezone('UTC')),
try_number=1)
TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor',
execution_date=datetime.datetime(2021, 6, 15, 4, 0, tzinfo=Timezone('UTC')),
try_number=1)
TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor',
execution_date=datetime.datetime(2021, 6, 15, 4, 20, tzinfo=Timezone('UTC')),
try_number=1)
TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_queued',
execution_date=datetime.datetime(2021, 6, 12, 21, 20, tzinfo=Timezone('UTC')),
try_number=1)
TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor',
execution_date=datetime.datetime(2021, 6, 15, 4, 5, tzinfo=Timezone('UTC')),
try_number=1)
TaskInstanceKey(dag_id='smart_sensor_group_shard_0',
task_id='smart_sensor_task', execution_date=datetime.datetime(2021, 6, 14, 7,
0, 2, 374657, tzinfo=Timezone('UTC')), try_number=42)
TaskInstanceKey(dag_id='smart_sensor_group_shard_2',
task_id='smart_sensor_task', execution_date=datetime.datetime(2021, 6, 14, 6,
58, 43, 797327, tzinfo=Timezone('UTC')), try_number=40)
TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor',
execution_date=datetime.datetime(2021, 6, 15, 4, 32, tzinfo=Timezone('UTC')),
try_number=1)
TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_queued',
execution_date=datetime.datetime(2021, 6, 12, 21, 0, tzinfo=Timezone('UTC')),
try_number=1)
TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor',
execution_date=datetime.datetime(2021, 6, 14, 22, 15, tzinfo=Timezone('UTC')),
try_number=1)
TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor',
execution_date=datetime.datetime(2021, 6, 15, 4, 10, tzinfo=Timezone('UTC')),
try_number=1)
TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_pusher',
execution_date=datetime.datetime(2021, 6, 15, 4, 0, tzinfo=Timezone('UTC')),
try_number=1)
TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor',
execution_date=datetime.datetime(2021, 6, 14, 22, 20, tzinfo=Timezone('UTC')),
try_number=1)
TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_queued',
execution_date=datetime.datetime(2021, 6, 15, 3, 0, tzinfo=Timezone('UTC')),
try_number=1)
TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_queued',
execution_date=datetime.datetime(2021, 6, 15, 0, 0, tzinfo=Timezone('UTC')),
try_number=1)
TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor',
execution_date=datetime.datetime(2021, 6, 15, 4, 20, tzinfo=Timezone('UTC')),
try_number=1)
TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_queued',
execution_date=datetime.datetime(2021, 6, 12, 21, 0, tzinfo=Timezone('UTC')),
try_number=1)
TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor',
execution_date=datetime.datetime(2021, 6, 14, 22, 30, tzinfo=Timezone('UTC')),
try_number=1)
TaskInstanceKey(dag_id='smart_sensor_group_shard_3',
task_id='smart_sensor_task', execution_date=datetime.datetime(2021, 6, 14, 6,
58, 43, 797668, tzinfo=Timezone('UTC')), try_number=41)
TaskInstanceKey(dag_id='smart_sensor_group_shard_4',
task_id='smart_sensor_task', execution_date=datetime.datetime(2021, 6, 14, 10,
54, 17, 679502, tzinfo=Timezone('UTC')), try_number=30)
TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_queued',
execution_date=datetime.datetime(2021, 6, 11, 6, 0, tzinfo=Timezone('UTC')),
try_number=1)
TaskInstanceKey(dag_id='smart_sensor_group_shard_0',
task_id='smart_sensor_task', execution_date=datetime.datetime(2021, 6, 14, 7,
0, 2, 374657, tzinfo=Timezone('UTC')), try_number=39)
[2021-06-15 07:13:49,540] {{base_executor.py:308}} INFO -
executor.event_buffer (0)
[2021-06-15 07:13:49,540] {{celery_executor.py:387}} INFO - executor.tasks
(27)
(TaskInstanceKey(dag_id='smart_sensor_group_shard_4',
task_id='smart_sensor_task', execution_date=datetime.datetime(2021, 6, 14, 10,
54, 17, 679502, tzinfo=Timezone('UTC')), try_number=30), <AsyncResult:
6960c6d0-7e21-4ef9-8f04-d95efdd9d706>)
(TaskInstanceKey(dag_id='smart_sensor_group_shard_2',
task_id='smart_sensor_task', execution_date=datetime.datetime(2021, 6, 14, 6,
58, 43, 797327, tzinfo=Timezone('UTC')), try_number=40), <AsyncResult:
4a815d59-d824-4373-8f96-34272174cfc0>)
(TaskInstanceKey(dag_id='smart_sensor_group_shard_3',
task_id='smart_sensor_task', execution_date=datetime.datetime(2021, 6, 14, 6,
58, 43, 797668, tzinfo=Timezone('UTC')), try_number=39), <AsyncResult:
5296e550-5efd-474a-9e34-897330504886>)
(TaskInstanceKey(dag_id='smart_sensor_group_shard_0',
task_id='smart_sensor_task', execution_date=datetime.datetime(2021, 6, 14, 7,
0, 2, 374657, tzinfo=Timezone('UTC')), try_number=39), <AsyncResult:
cad2a01b-cde3-473e-85bc-9f96eae7da7c>)
(TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_pusher',
execution_date=datetime.datetime(2021, 6, 15, 4, 0, tzinfo=Timezone('UTC')),
try_number=1), <AsyncResult: cae54db9-56a5-4a27-8226-1dc5867265f7>)
(TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor',
execution_date=datetime.datetime(2021, 6, 14, 22, 0, tzinfo=Timezone('UTC')),
try_number=1), <AsyncResult: 438e5379-ef8d-4bfe-8e55-cae461a3f62f>)
(TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor',
execution_date=datetime.datetime(2021, 6, 15, 4, 0, tzinfo=Timezone('UTC')),
try_number=1), <AsyncResult: 9a953b60-b222-4d4c-aff4-5b831caef71f>)
(TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_pusher',
execution_date=datetime.datetime(2021, 6, 15, 4, 0, tzinfo=Timezone('UTC')),
try_number=1), <AsyncResult: 840e404c-1cee-4c18-8123-b354a63e8d80>)
(TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor',
execution_date=datetime.datetime(2021, 6, 14, 22, 20, tzinfo=Timezone('UTC')),
try_number=1), <AsyncResult: 4410416b-7132-4713-a78d-caa0ce3cacb0>)
(TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor',
execution_date=datetime.datetime(2021, 6, 15, 4, 0, tzinfo=Timezone('UTC')),
try_number=1), <AsyncResult: 91d5060a-ff5c-4997-832a-757c25069ac7>)
(TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor',
execution_date=datetime.datetime(2021, 6, 15, 4, 10, tzinfo=Timezone('UTC')),
try_number=1), <AsyncResult: ade2d701-f632-41d9-a901-40531619dfba>)
(TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor',
execution_date=datetime.datetime(2021, 6, 15, 4, 20, tzinfo=Timezone('UTC')),
try_number=1), <AsyncResult: febb3b1e-0f63-41d3-83dc-58f5b11a39ca>)
(TaskInstanceKey(dag_id='ergo_job_collector',
task_id='process_job_result', execution_date=datetime.datetime(2021, 6, 15, 4,
39, 10, 753973, tzinfo=Timezone('UTC')), try_number=1), <AsyncResult:
390f1451-7f44-4415-8c57-e2542ed6b6d2>)
(TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor',
execution_date=datetime.datetime(2021, 6, 14, 22, 30, tzinfo=Timezone('UTC')),
try_number=1), <AsyncResult: 7926f732-e671-4f64-afaf-7c352a85a929>)
(TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor',
execution_date=datetime.datetime(2021, 6, 15, 4, 32, tzinfo=Timezone('UTC')),
try_number=1), <AsyncResult: ba1edc8e-66b4-4a20-9cc6-5b0c5758d8c4>)
(TaskInstanceKey(dag_id='ergo_task_queuer', task_id='push_tasks',
execution_date=datetime.datetime(2021, 6, 15, 4, 39, 1, 316420,
tzinfo=Timezone('UTC')), try_number=1), <AsyncResult:
0ef80d4f-1c4c-4e62-b5ed-cebd0b3ba075>)
(TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor',
execution_date=datetime.datetime(2021, 6, 15, 4, 20, tzinfo=Timezone('UTC')),
try_number=1), <AsyncResult: fea3edfa-477d-4fb0-a580-03ea91faaef1>)
(TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_pusher',
execution_date=datetime.datetime(2021, 6, 15, 4, 0, tzinfo=Timezone('UTC')),
try_number=1), <AsyncResult: 4749177d-df70-49cf-a2fb-9b0a230d5ff4>)
(TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_queued',
execution_date=datetime.datetime(2021, 6, 15, 0, 0, tzinfo=Timezone('UTC')),
try_number=1), <AsyncResult: bf04f105-efc7-41be-864c-8161c4973f6b>)
(TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_task_queued',
execution_date=datetime.datetime(2021, 6, 15, 3, 0, tzinfo=Timezone('UTC')),
try_number=1), <AsyncResult: 7c8c6550-cdcc-4405-816b-15da1dbeb79e>)
(TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor',
execution_date=datetime.datetime(2021, 6, 14, 22, 15, tzinfo=Timezone('UTC')),
try_number=1), <AsyncResult: ee255fe6-21d2-4d3f-8860-83e55265fea9>)
(TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor',
execution_date=datetime.datetime(2021, 6, 14, 22, 20, tzinfo=Timezone('UTC')),
try_number=1), <AsyncResult: 46b77d60-43b5-41e6-99eb-e407e0d74791>)
(TaskInstanceKey(dag_id='[REDACTED]', task_id='chronos_job_sensor',
execution_date=datetime.datetime(2021, 6, 15, 4, 5, tzinfo=Timezone('UTC')),
try_number=1), <AsyncResult: ed5115f9-dec6-49da-a5bf-15860a907ab4>)
(TaskInstanceKey(dag_id='smart_sensor_group_shard_4',
task_id='smart_sensor_task', execution_date=datetime.datetime(2021, 6, 14, 10,
54, 17, 679502, tzinfo=Timezone('UTC')), try_number=32), <AsyncResult:
49ee2d3f-eef1-4f7a-a5da-923e8cb8b914>)
(TaskInstanceKey(dag_id='smart_sensor_group_shard_1',
task_id='smart_sensor_task', execution_date=datetime.datetime(2021, 6, 14, 7,
3, 23, 693482, tzinfo=Timezone('UTC')), try_number=41), <AsyncResult:
910a1d67-658d-4fe9-954a-2d093b64c95d>)
(TaskInstanceKey(dag_id='smart_sensor_group_shard_0',
task_id='smart_sensor_task', execution_date=datetime.datetime(2021, 6, 14, 7,
0, 2, 374657, tzinfo=Timezone('UTC')), try_number=42), <AsyncResult:
8061f49f-8050-4b73-84fd-4a0222369040>)
(TaskInstanceKey(dag_id='smart_sensor_group_shard_3',
task_id='smart_sensor_task', execution_date=datetime.datetime(2021, 6, 14, 6,
58, 43, 797668, tzinfo=Timezone('UTC')), try_number=41), <AsyncResult:
262cf0ac-1132-4176-aba0-77ec4a2bddb7>)
[2021-06-15 07:13:49,540] {{celery_executor.py:390}} INFO -
executor.adopted_task_timeouts (0)
[2021-06-15 07:13:49,540] {{scheduler_job.py:760}} INFO -
--------------------------------------------------------------------------------
[2021-06-15 07:13:51,028] {{dag_processing.py:838}} INFO -
================================================================================
DAG File Processing Stats
File Path
PID Runtime # DAGs # Errors Last Runtime
Last Run
------------------------------------------------------------------------------------------------
----- --------- -------- ---------- -------------- -------------------
/home/airflow/.local/lib/python3.8/site-packages/airflow/smart_sensor_dags/__init__.py
0 0 0.09s
2021-06-15T07:13:50
/opt/airflow/dags/dag_ergo.py
2 0 0.47s
2021-06-15T07:13:51
/home/airflow/.local/lib/python3.8/site-packages/airflow/smart_sensor_dags/smart_sensor_group.py
16087 0.00s 5 0 0.15s 2021-06-15T07:13:50
/opt/airflow/dags/dag_dagen.py
16081 1.48s 90 0 4.10s
2021-06-15T07:13:49
================================================================================
```
Workers have already finished executing these tasks but the scheduler is
stuck and doesn't schedule any new tasks. The only fix that works is by
manually terminating and restarting the scheduler pod.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]