[ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17198282#comment-17198282
 ] 

ASF GitHub Bot commented on AIRFLOW-5071:
-----------------------------------------

yuqian90 commented on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-694797430


   > After digging further, I think the slowness that causes the error for our 
case is in this function: `SchedulerJob._process_dags()`. If this function 
takes around 60s, those `reschedule` sensors will hit the `ERROR - Executor 
reports task instance ... killed externally?` error. My previous comment about 
adding the `time.sleep(30)` is just one way to replicate this issue. Anything 
that causes `_process_dags()` to slow down should be able to replicate this 
error.
   
   Some further investigation shows that the slow down that caused this issue 
for our case (Airflow 1.10.12) was in `SchedulerJob._process_task_instances`. 
This is periodically called in the `DagFileProcessor` process spawned by the 
airflow scheduler. Anything that causes this function to take more than 60s 
seems to cause these `ERROR - Executor reports task instance ... killed 
externally?` errors for sensors in `reschedule` mode with `poke_interval` of 
60s. I'm trying to address one of the cause of the 
`SchedulerJob._process_task_instances` slowdown for our own case here #11010, 
but that's not a fix for the other causes of this same error. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-5071
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: DAG, scheduler
>    Affects Versions: 1.10.3
>            Reporter: msempere
>            Priority: Critical
>             Fix For: 1.10.12
>
>         Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance <TaskInstance: X 
> 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance <TaskInstance: X 
> 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to