[ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17194850#comment-17194850
 ] 

ASF GitHub Bot commented on AIRFLOW-5071:
-----------------------------------------

yuqian90 edited a comment on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-690785232


   > Ok @yuqian90 @sgrzemski-ias what is you setting for 
core.dagbag_import_timeout ?
   > 
   > As I'm hitting:
   > 
   > Traceback (most recent call last): File 
"/usr/local/lib/airflow/airflow/models/dagbag.py", line 237, in process_file m 
= imp.load_source(mod_name, filepath) File 
"/opt/python3.6/lib/python3.6/imp.py", line 172, in load_source module = 
_load(spec) File "", line 684, in _load File "", line 665, in _load_unlocked 
File "", line 678, in exec_module File "", line 219, in 
_call_with_frames_removed File "/home/airflow/gcs/dags/test_dag_1.py", line 24, 
in time.sleep(30) File "/usr/local/lib/airflow/airflow/utils/timeout.py", line 
43, in handle_timeout raise AirflowTaskTimeout(self.error_message) 
airflow.exceptions.AirflowTaskTimeout: Timeout, PID: 6217
   
   Hi, @turbaszek  in my case I have `dagbag_import_timeout = 100` and 
`dag_file_processor_timeout = 300`. Most of the time dag import takes about 
10s. dag file processing can take 60s that's why it's set to a large number.
   
   After digging further, I think the slowness that causes the error for our 
case is in this function: `SchedulerJob._process_dags()`. If this function 
takes around 60s, those `reschedule` sensors will hit the `ERROR - Executor 
reports task instance ... killed externally?` error. My previous comment about 
adding the `time.sleep(30)` is just one way to replicate this issue. Anything 
that causes `_process_dags()` to slow down should be able to replicate this 
error.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-5071
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: DAG, scheduler
>    Affects Versions: 1.10.3
>            Reporter: msempere
>            Priority: Critical
>             Fix For: 1.10.12
>
>         Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance <TaskInstance: X 
> 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance <TaskInstance: X 
> 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to