millin edited a comment on issue #16227:
URL: https://github.com/apache/airflow/issues/16227#issuecomment-853824350
Hi everyone!
Looks like we are faced with a related issue.
After upgrading Airflow to 2.1.0 our sensors in reshedule mode started
getting SIGTERM from previous poke, thats should be already finished, but
really not finished yet.
Here my shortened debug logs of this:
```
[2021-05-31 15:00:57,133] {taskinstance.py:876} INFO - Dependencies all met
for <TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00 [queued]>
[2021-05-31 15:00:57,214] {taskinstance.py:876} INFO - Dependencies all met
for <TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00 [queued]>
[2021-05-31 15:00:57,214] {taskinstance.py:1067} INFO -
--------------------------------------------------------------------------------
[2021-05-31 15:00:57,214] {taskinstance.py:1068} INFO - Starting attempt 1
of 1
[2021-05-31 15:00:57,215] {taskinstance.py:1069} INFO -
--------------------------------------------------------------------------------
[2021-05-31 15:00:57,239] {taskinstance.py:1087} INFO - Executing
<Task(ExternalTaskSensor): sensor__stg.orders> on 2021-05-31T11:45:00+00:00
[2021-05-31 15:00:57,245] {standard_task_runner.py:52} INFO - Started
process 468 to run task
[2021-05-31 15:00:57,258] {standard_task_runner.py:76} INFO - Running:
['airflow', 'tasks', 'run', 'dds-15', 'sensor__stg.orders',
'2021-05-31T11:45:00+00:00', '--job-id', '1252746', '--pool',
'stg_sensors_pool', '--raw', '--subdir', 'DAGS_FOLDER/stg-dds/dds.py',
'--cfg-path', '/tmp/tmpob2wq8qo', '--error-file', '/tmp/tmpnk04mw7o']
[2021-05-31 15:00:57,261] {standard_task_runner.py:77} INFO - Job 1252746:
Subtask sensor__stg.orders
[2021-05-31 15:00:57,267] {cli_action_loggers.py:66} DEBUG - Calling
callbacks: [<function default_action_log at 0x7f25ac02a160>]
[2021-05-31 15:00:57,307] {settings.py:210} DEBUG - Setting up DB connection
pool (PID 468)
[2021-05-31 15:00:57,308] {settings.py:246} DEBUG -
settings.prepare_engine_args(): Using NullPool
[2021-05-31 15:00:57,318] {taskinstance.py:594} DEBUG - Refreshing
TaskInstance <TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00
[None]> from DB
[2021-05-31 15:00:57,418] {taskinstance.py:629} DEBUG - Refreshed
TaskInstance <TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00
[running]>
[2021-05-31 15:00:57,425] {logging_mixin.py:104} INFO - Running
<TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00 [running]>
on host airflow-dwh-01
[2021-05-31 15:00:57,441] {taskinstance.py:594} DEBUG - Refreshing
TaskInstance <TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00
[running]> from DB
[2021-05-31 15:00:57,467] {taskinstance.py:629} DEBUG - Refreshed
TaskInstance <TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00
[running]>
[2021-05-31 15:00:57,580] {external_task.py:151} INFO - Poking for
dwh-stg-15.marker_dds__orders on 2021-05-31T11:45:00+00:00 ...
[2021-05-31 15:00:57,809] {taskinstance.py:594} DEBUG - Refreshing
TaskInstance <TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00
[running]> from DB
[2021-05-31 15:00:57,830] {taskinstance.py:629} DEBUG - Refreshed
TaskInstance <TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00
[running]>
[2021-05-31 15:00:57,833] {taskinstance.py:1888} DEBUG - Task Duration set
to 0.699986
[2021-05-31 15:00:57,860] {cli_action_loggers.py:84} DEBUG - Calling
callbacks: []
[2021-05-31 15:00:57,860] {taskinstance.py:1464} INFO - Rescheduling task,
marking task as UP_FOR_RESCHEDULE
[2021-05-31 15:00:57,877] {taskinstance.py:594} DEBUG - Refreshing
TaskInstance <TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00
[running]> from DB
[2021-05-31 15:00:57,877] {local_task_job.py:151} INFO - Task exited with
return code 0
```
At this moment the process (PID 468) should have been finished, but as we
see in below logs it continued to make heartbeats and kills next poke process:
```
[2021-05-31 15:01:21,365] {taskinstance.py:876} INFO - Dependencies all met
for <TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00 [queued]>
[2021-05-31 15:01:21,414] {taskinstance.py:876} INFO - Dependencies all met
for <TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00 [queued]>
[2021-05-31 15:01:21,414] {taskinstance.py:1067} INFO -
--------------------------------------------------------------------------------
[2021-05-31 15:01:21,415] {taskinstance.py:1068} INFO - Starting attempt 1
of 1
[2021-05-31 15:01:21,416] {taskinstance.py:1069} INFO -
--------------------------------------------------------------------------------
[2021-05-31 15:01:21,434] {taskinstance.py:1087} INFO - Executing
<Task(ExternalTaskSensor): sensor__stg.orders> on 2021-05-31T11:45:00+00:00
[2021-05-31 15:01:21,455] {standard_task_runner.py:52} INFO - Started
process 542 to run task
[2021-05-31 15:01:21,472] {standard_task_runner.py:76} INFO - Running:
['airflow', 'tasks', 'run', 'dds-15', 'sensor__stg.orders',
'2021-05-31T11:45:00+00:00', '--job-id', '1252826', '--pool',
'stg_sensors_pool', '--raw', '--subdir', 'DAGS_FOLDER/stg-dds/dds.py',
'--cfg-path', '/tmp/tmptax87gf8', '--error-file', '/tmp/tmpr0631j5m']
[2021-05-31 15:01:21,482] {standard_task_runner.py:77} INFO - Job 1252826:
Subtask sensor__stg.orders
[2021-05-31 15:01:21,484] {cli_action_loggers.py:66} DEBUG - Calling
callbacks: [<function default_action_log at 0x7f4751597160>]
[2021-05-31 15:01:26,521] {settings.py:210} DEBUG - Setting up DB connection
pool (PID 542)
[2021-05-31 15:01:26,522] {settings.py:246} DEBUG -
settings.prepare_engine_args(): Using NullPool
[2021-05-31 15:01:26,529] {taskinstance.py:594} DEBUG - Refreshing
TaskInstance <TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00
[None]> from DB
[2021-05-31 15:01:26,547] {taskinstance.py:594} DEBUG - Refreshing
TaskInstance <TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00
[running]> from DB
[2021-05-31 15:01:26,562] {taskinstance.py:629} DEBUG - Refreshed
TaskInstance <TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00
[running]>
[2021-05-31 15:01:26,568] {logging_mixin.py:104} INFO - Running
<TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00 [running]>
on host airflow-dwh-02
[2021-05-31 15:01:26,572] {taskinstance.py:594} DEBUG - Refreshing
TaskInstance <TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00
[running]> from DB
[2021-05-31 15:01:26,573] {taskinstance.py:629} DEBUG - Refreshed
TaskInstance <TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00
[running]>
[2021-05-31 15:01:26,587] {local_task_job.py:193} WARNING - Recorded pid 468
does not match the current pid 542
[2021-05-31 15:01:26,592] {taskinstance.py:629} DEBUG - Refreshed
TaskInstance <TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00
[running]>
[2021-05-31 15:01:26,596] {process_utils.py:100} INFO - Sending
Signals.SIGTERM to GPID 542
[2021-05-31 15:01:26,613] {process_utils.py:66} INFO - Process
psutil.Process(pid=542, status='terminated', exitcode=<Negsignal.SIGTERM: -15>,
started='15:01:20') (542) terminated with exit code Negsignal.SIGTERM
```
I found that this bug began to affect our sensors after fixing incorrect
validation of recorded pid by commit 817b599.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]