millin edited a comment on issue #16227:
URL: https://github.com/apache/airflow/issues/16227#issuecomment-853824350


   Hi everyone!
   
   Looks like we are faced with a related issue.
   After upgrading Airflow to 2.1.0 our sensors in reshedule mode started 
getting SIGTERM from previous poke, thats should be already finished, but 
really not finished yet.
   
   Here my shortened debug logs of this:
   ```
   [2021-05-31 15:00:57,133] {taskinstance.py:876} INFO - Dependencies all met 
for <TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00 [queued]>
   [2021-05-31 15:00:57,214] {taskinstance.py:876} INFO - Dependencies all met 
for <TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00 [queued]>
   [2021-05-31 15:00:57,214] {taskinstance.py:1067} INFO -
   
--------------------------------------------------------------------------------
   [2021-05-31 15:00:57,214] {taskinstance.py:1068} INFO - Starting attempt 1 
of 1
   [2021-05-31 15:00:57,215] {taskinstance.py:1069} INFO -
   
--------------------------------------------------------------------------------
   [2021-05-31 15:00:57,239] {taskinstance.py:1087} INFO - Executing 
<Task(ExternalTaskSensor): sensor__stg.orders> on 2021-05-31T11:45:00+00:00
   [2021-05-31 15:00:57,245] {standard_task_runner.py:52} INFO - Started 
process 468 to run task
   [2021-05-31 15:00:57,258] {standard_task_runner.py:76} INFO - Running: 
['airflow', 'tasks', 'run', 'dds-15', 'sensor__stg.orders', 
'2021-05-31T11:45:00+00:00', '--job-id', '1252746', '--pool', 
'stg_sensors_pool', '--raw', '--subdir', 'DAGS_FOLDER/stg-dds/dds.py', 
'--cfg-path', '/tmp/tmpob2wq8qo', '--error-file', '/tmp/tmpnk04mw7o']
   [2021-05-31 15:00:57,261] {standard_task_runner.py:77} INFO - Job 1252746: 
Subtask sensor__stg.orders
   [2021-05-31 15:00:57,267] {cli_action_loggers.py:66} DEBUG - Calling 
callbacks: [<function default_action_log at 0x7f25ac02a160>]
   [2021-05-31 15:00:57,307] {settings.py:210} DEBUG - Setting up DB connection 
pool (PID 468)
   [2021-05-31 15:00:57,308] {settings.py:246} DEBUG - 
settings.prepare_engine_args(): Using NullPool
   [2021-05-31 15:00:57,318] {taskinstance.py:594} DEBUG - Refreshing 
TaskInstance <TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00 
[None]> from DB
   [2021-05-31 15:00:57,418] {taskinstance.py:629} DEBUG - Refreshed 
TaskInstance <TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00 
[running]>
   [2021-05-31 15:00:57,425] {logging_mixin.py:104} INFO - Running 
<TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00 [running]> 
on host airflow-dwh-01
   [2021-05-31 15:00:57,441] {taskinstance.py:594} DEBUG - Refreshing 
TaskInstance <TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00 
[running]> from DB
   [2021-05-31 15:00:57,467] {taskinstance.py:629} DEBUG - Refreshed 
TaskInstance <TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00 
[running]>
   [2021-05-31 15:00:57,580] {external_task.py:151} INFO - Poking for 
dwh-stg-15.marker_dds__orders on 2021-05-31T11:45:00+00:00 ...
   [2021-05-31 15:00:57,809] {taskinstance.py:594} DEBUG - Refreshing 
TaskInstance <TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00 
[running]> from DB
   [2021-05-31 15:00:57,830] {taskinstance.py:629} DEBUG - Refreshed 
TaskInstance <TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00 
[running]>
   [2021-05-31 15:00:57,833] {taskinstance.py:1888} DEBUG - Task Duration set 
to 0.699986
   [2021-05-31 15:00:57,860] {cli_action_loggers.py:84} DEBUG - Calling 
callbacks: []
   [2021-05-31 15:00:57,860] {taskinstance.py:1464} INFO - Rescheduling task, 
marking task as UP_FOR_RESCHEDULE
   [2021-05-31 15:00:57,877] {taskinstance.py:594} DEBUG - Refreshing 
TaskInstance <TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00 
[running]> from DB
   [2021-05-31 15:00:57,877] {local_task_job.py:151} INFO - Task exited with 
return code 0
   ```
   At this moment the process (PID 468) should have been finished, but as we 
see in below logs it continued to make heartbeats and kills next poke process:
   ```
   [2021-05-31 15:01:21,365] {taskinstance.py:876} INFO - Dependencies all met 
for <TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00 [queued]>
   [2021-05-31 15:01:21,414] {taskinstance.py:876} INFO - Dependencies all met 
for <TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00 [queued]>
   [2021-05-31 15:01:21,414] {taskinstance.py:1067} INFO -
   
--------------------------------------------------------------------------------
   [2021-05-31 15:01:21,415] {taskinstance.py:1068} INFO - Starting attempt 1 
of 1
   [2021-05-31 15:01:21,416] {taskinstance.py:1069} INFO -
   
--------------------------------------------------------------------------------
   [2021-05-31 15:01:21,434] {taskinstance.py:1087} INFO - Executing 
<Task(ExternalTaskSensor): sensor__stg.orders> on 2021-05-31T11:45:00+00:00
   [2021-05-31 15:01:21,455] {standard_task_runner.py:52} INFO - Started 
process 542 to run task
   [2021-05-31 15:01:21,472] {standard_task_runner.py:76} INFO - Running: 
['airflow', 'tasks', 'run', 'dds-15', 'sensor__stg.orders', 
'2021-05-31T11:45:00+00:00', '--job-id', '1252826', '--pool', 
'stg_sensors_pool', '--raw', '--subdir', 'DAGS_FOLDER/stg-dds/dds.py', 
'--cfg-path', '/tmp/tmptax87gf8', '--error-file', '/tmp/tmpr0631j5m']
   [2021-05-31 15:01:21,482] {standard_task_runner.py:77} INFO - Job 1252826: 
Subtask sensor__stg.orders
   [2021-05-31 15:01:21,484] {cli_action_loggers.py:66} DEBUG - Calling 
callbacks: [<function default_action_log at 0x7f4751597160>]
   [2021-05-31 15:01:26,521] {settings.py:210} DEBUG - Setting up DB connection 
pool (PID 542)
   [2021-05-31 15:01:26,522] {settings.py:246} DEBUG - 
settings.prepare_engine_args(): Using NullPool
   [2021-05-31 15:01:26,529] {taskinstance.py:594} DEBUG - Refreshing 
TaskInstance <TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00 
[None]> from DB
   [2021-05-31 15:01:26,547] {taskinstance.py:594} DEBUG - Refreshing 
TaskInstance <TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00 
[running]> from DB
   [2021-05-31 15:01:26,562] {taskinstance.py:629} DEBUG - Refreshed 
TaskInstance <TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00 
[running]>
   [2021-05-31 15:01:26,568] {logging_mixin.py:104} INFO - Running 
<TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00 [running]> 
on host airflow-dwh-02
   [2021-05-31 15:01:26,572] {taskinstance.py:594} DEBUG - Refreshing 
TaskInstance <TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00 
[running]> from DB
   [2021-05-31 15:01:26,573] {taskinstance.py:629} DEBUG - Refreshed 
TaskInstance <TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00 
[running]>
   [2021-05-31 15:01:26,587] {local_task_job.py:193} WARNING - Recorded pid 468 
does not match the current pid 542
   [2021-05-31 15:01:26,592] {taskinstance.py:629} DEBUG - Refreshed 
TaskInstance <TaskInstance: dds-15.sensor__stg.orders 2021-05-31T11:45:00+00:00 
[running]>
   [2021-05-31 15:01:26,596] {process_utils.py:100} INFO - Sending 
Signals.SIGTERM to GPID 542
   [2021-05-31 15:01:26,613] {process_utils.py:66} INFO - Process 
psutil.Process(pid=542, status='terminated', exitcode=<Negsignal.SIGTERM: -15>, 
started='15:01:20') (542) terminated with exit code Negsignal.SIGTERM
   ```
   
   I found that this bug began to affect our sensors after fixing incorrect 
validation of recorded pid by commit 817b599.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to