[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?

ASF GitHub Bot (Jira) Mon, 14 Mar 2022 01:31:13 -0700


    [ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17506059#comment-17506059
 ]


ASF GitHub Bot commented on AIRFLOW-5071:
-----------------------------------------

aakashanand92 edited a comment on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-1066506022


   > We face the same issue with tasks that stay indefinitely in a queued 
status, except that we don't see tasks as `up_for_retry`. It happens randomly 
within our DAGs. The task will stay in a queued status forever until we 
manually make it fail. We **don't use any sensors** at all. We are on an AWS 
MWAA instance (Airflow 2.0.2).
   > 
   > Example logs: Scheduler:
   > 
   > ```
   > [2022-01-14 08:03:32,868] {{scheduler_job.py:1239}} ERROR - Executor 
reports task instance <TaskInstance: task0 2022-01-13 07:00:00+00:00 [queued]> 
finished (failed) although the task says its queued. (Info: None) Was the task 
killed externally?
   > [2022-01-14 08:03:32,845] {{scheduler_job.py:1210}} INFO - Executor 
reports execution of task0 execution_date=2022-01-13 07:00:00+00:00 exited with 
status failed for try_number 1
   > <TaskInstance: task0 2022-01-13 07:00:00+00:00 [queued]> in state FAILURE
   > ```
   > 
   > Worker:
   > 
   > ```
   > [2021-04-20 20:54:29,109: ERROR/ForkPoolWorker-15] Failed to execute task 
dag_id could not be found: task0. Either the dag did not exist or it failed to 
parse..`
   > This is not seen in the worker logs for every occurrence in the scheduler 
logs.
   > ```
   > 
   > Because of the MWAA autoscaling mechanism, `worker_concurrency` is not 
configurable. `worker_autoscale`: `10, 10`. `dagbag_import_timeout`: 120s 
`dag_file_processor_timeout`: 50s `parallelism` = 48 `dag_concurrency` = 10000 
`max_threads` = 8
   > 
   > We currently have 2 (minWorkers) to 10 (maxWorkers) mw1.medium (2 vCPU) 
workers.
   
   Did you find a solution for this ? I am also using MWAA environment and 
facing the same issue.
   
   The tasks get stuck in queued state and when I look at the scheduler logs I 
can see the same error.
   
   "Executor reports task instance %s finished (%s) although the task says its 
%s. (Info: %s) Was the task killed externally?"
   
   I tried everything I can find in this thread but nothing seems to be working.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-5071
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: DAG, scheduler
>    Affects Versions: 1.10.3
>            Reporter: msempere
>            Priority: Critical
>             Fix For: 1.10.12
>
>         Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance <TaskInstance: X 
> 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance <TaskInstance: X 
> 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?

Reply via email to