yuqian90 commented on issue #10790: URL: https://github.com/apache/airflow/issues/10790#issuecomment-694797430
> After digging further, I think the slowness that causes the error for our case is in this function: `SchedulerJob._process_dags()`. If this function takes around 60s, those `reschedule` sensors will hit the `ERROR - Executor reports task instance ... killed externally?` error. My previous comment about adding the `time.sleep(30)` is just one way to replicate this issue. Anything that causes `_process_dags()` to slow down should be able to replicate this error. Some further investigation shows that the slow down that caused this issue for our case (Airflow 1.10.12) was in `SchedulerJob._process_task_instances`. This is periodically called in the `DagFileProcessor` process spawned by the airflow scheduler. Anything that causes this function to take more than 60s seems to cause these `ERROR - Executor reports task instance ... killed externally?` errors for sensors in `reschedule` mode with `poke_interval` of 60s. I'm trying to address one of the cause of the `SchedulerJob._process_task_instances` slowdown for our own case here #11010, but that's not a fix for the other causes of this same error. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
