MartinKChen opened a new issue #19043:
URL: https://github.com/apache/airflow/issues/19043


   ### Apache Airflow version
   
   2.0.2
   
   ### Operating System
   
   Linux (AWS MWAA)
   
   ### Versions of Apache Airflow Providers
   
   MWAA with Apache Airflow 2.0.2
   
   ### Deployment
   
   MWAA
   
   ### Deployment details
   
   _No response_
   
   ### What happened
   
   Shard tasks of SmartSensor keep terminated randomly and with no logs.
   
   **Case 1: Shard been terminated after tasks sensed without any log.**
   [2021-10-18 07:10:15,580] {{smart_sensor.py:358}} INFO - Performance query 0 
tis, time: 0.008
   [2021-10-18 07:10:15,605] {{smart_sensor.py:373}} INFO - 0 tasks detected.
   [2021-10-18 07:10:15,633] {{smart_sensor.py:739}} INFO - Loaded 0 
sensor_works
   [2021-10-18 07:10:15,666] {{smart_sensor.py:747}} INFO - Taking 0.09376 to 
execute 0 tasks.
   [2021-10-18 07:13:15,671] {{smart_sensor.py:358}} INFO - Performance query 0 
tis, time: 0.014
   [2021-10-18 07:13:15,736] {{smart_sensor.py:373}} INFO - 0 tasks detected.
   [2021-10-18 07:13:15,768] {{smart_sensor.py:739}} INFO - Loaded 0 
sensor_works
   [2021-10-18 07:13:15,844] {{smart_sensor.py:747}} INFO - Taking 0.186399 to 
execute 0 tasks.
   [2021-10-18 07:16:15,794] {{smart_sensor.py:358}} INFO - Performance query 4 
tis, time: 0.008
   [2021-10-18 07:35:25,736] {{process_utils.py:100}} INFO - Sending 
Signals.SIGTERM to GPID 13284
   [2021-10-18 07:35:25,996] {{process_utils.py:66}} INFO - Process 
psutil.Process(pid=13284, status='terminated', exitcode=1, started='07:10:13') 
(13284) terminated with exit code 1
   
   
   **Case 2: Shard been terminated after 1 day with 0 task sensed within the 
cycle (last for around 24 hours). And the log is unclear to what cause the 
termination**
   [2021-10-13 07:27:28,922] {{smart_sensor.py:358}} INFO - Performance query 0 
tis, time: 0.013
   [2021-10-13 07:27:28,989] {{smart_sensor.py:373}} INFO - 0 tasks detected.
   [2021-10-13 07:27:29,021] {{smart_sensor.py:739}} INFO - Loaded 0 
sensor_works
   [2021-10-13 07:27:29,044] {{smart_sensor.py:747}} INFO - Taking 0.135416 to 
execute 0 tasks.
   [2021-10-13 07:30:29,065] {{smart_sensor.py:358}} INFO - Performance query 0 
tis, time: 0.012
   [2021-10-13 07:30:29,114] {{smart_sensor.py:373}} INFO - 0 tasks detected.
   [2021-10-13 07:30:29,141] {{smart_sensor.py:739}} INFO - Loaded 0 
sensor_works
   [2021-10-13 07:30:29,163] {{smart_sensor.py:747}} INFO - Taking 0.11131 to 
execute 0 tasks.
   [2021-10-13 07:33:29,155] {{smart_sensor.py:358}} INFO - Performance query 0 
tis, time: 0.006
   [2021-10-13 07:33:29,201] {{smart_sensor.py:373}} INFO - 0 tasks detected.
   .......
   [2021-10-14 07:25:25,821] {{smart_sensor.py:358}} INFO - Performance query 0 
tis, time: 0.006
   [2021-10-14 07:25:25,870] {{smart_sensor.py:373}} INFO - 0 tasks detected.
   [2021-10-14 07:25:25,896] {{smart_sensor.py:739}} INFO - Loaded 0 
sensor_works
   [2021-10-14 07:25:25,922] {{smart_sensor.py:747}} INFO - Taking 0.10663 to 
execute 0 tasks.
   [2021-10-14 07:27:30,839] {{process_utils.py:100}} INFO - Sending 
Signals.SIGTERM to GPID 25285
   [2021-10-14 07:27:30,933] {{taskinstance.py:1265}} ERROR - Received SIGTERM. 
Terminating subprocesses.
   [2021-10-14 07:27:31,028] {{taskinstance.py:1482}} ERROR - Task failed with 
exception
   Traceback (most recent call last):
     File 
"/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 
1138, in _run_raw_task
       self._prepare_and_execute_task_with_callbacks(context, task)
     File 
"/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 
1311, in _prepare_and_execute_task_with_callbacks
       result = self._execute_task(context, task_copy)
     File 
"/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 
1341, in _execute_task
       result = task_copy.execute(context=context)
     File 
"/usr/local/lib/python3.7/site-packages/airflow/sensors/smart_sensor.py", line 
754, in execute
       sleep(self.poke_interval - duration)
     File 
"/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 
1267, in signal_handler
       raise AirflowException("Task received SIGTERM signal")
   airflow.exceptions.AirflowException: Task received SIGTERM signal
   [2021-10-14 07:27:31,060] {{taskinstance.py:1532}} INFO - Marking task as 
UP_FOR_RETRY. dag_id=smart_sensor_group_shard_1, task_id=smart_sensor_task, 
execution_date=20211013T072226, start_date=20211013T072728, 
end_date=20211014T072731
   [2021-10-14 07:27:31,146] {{process_utils.py:66}} INFO - Process 
psutil.Process(pid=25285, status='terminated', exitcode=1, started='2021-10-13 
07:27:28') (25285) terminated with exit code 1
   
   
   
   
   
   ### What you expected to happen
   
   1. Shards of SmartSensor should not be terminated unless there is an error 
occured
   2. If Shards are terminated due to any errors, there should have 
corresponding logs when we check task log of the Shard.
   
   ### How to reproduce
   
   1. Enable SmartSensor
   2. Schedule DAGs with SmartSensor enabled operators.
   3. Let the DAG run for couple hours/days.
   
   ### Anything else
   
   The error/termination occurs every couple seconds/hours with no pattern. 
Which cause us hard to monitor as most of them are just false alarm due to the 
self-termination.
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to