[ https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17198282#comment-17198282 ]
ASF GitHub Bot commented on AIRFLOW-5071: ----------------------------------------- yuqian90 commented on issue #10790: URL: https://github.com/apache/airflow/issues/10790#issuecomment-694797430 > After digging further, I think the slowness that causes the error for our case is in this function: `SchedulerJob._process_dags()`. If this function takes around 60s, those `reschedule` sensors will hit the `ERROR - Executor reports task instance ... killed externally?` error. My previous comment about adding the `time.sleep(30)` is just one way to replicate this issue. Anything that causes `_process_dags()` to slow down should be able to replicate this error. Some further investigation shows that the slow down that caused this issue for our case (Airflow 1.10.12) was in `SchedulerJob._process_task_instances`. This is periodically called in the `DagFileProcessor` process spawned by the airflow scheduler. Anything that causes this function to take more than 60s seems to cause these `ERROR - Executor reports task instance ... killed externally?` errors for sensors in `reschedule` mode with `poke_interval` of 60s. I'm trying to address one of the cause of the `SchedulerJob._process_task_instances` slowdown for our own case here #11010, but that's not a fix for the other causes of this same error. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Thousand os Executor reports task instance X finished (success) although the > task says its queued. Was the task killed externally? > ---------------------------------------------------------------------------------------------------------------------------------- > > Key: AIRFLOW-5071 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5071 > Project: Apache Airflow > Issue Type: Bug > Components: DAG, scheduler > Affects Versions: 1.10.3 > Reporter: msempere > Priority: Critical > Fix For: 1.10.12 > > Attachments: image-2020-01-27-18-10-29-124.png, > image-2020-07-08-07-58-42-972.png > > > I'm opening this issue because since I update to 1.10.3 I'm seeing thousands > of daily messages like the following in the logs: > > ``` > {{__init__.py:1580}} ERROR - Executor reports task instance <TaskInstance: X > 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > {{jobs.py:1484}} ERROR - Executor reports task instance <TaskInstance: X > 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > ``` > -And looks like this is triggering also thousand of daily emails because the > flag to send email in case of failure is set to True.- > I have Airflow setup to use Celery and Redis as a backend queue service. -- This message was sent by Atlassian Jira (v8.3.4#803005)