[ https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17476036#comment-17476036 ]
ASF GitHub Bot commented on AIRFLOW-5071: ----------------------------------------- val2k edited a comment on issue #10790: URL: https://github.com/apache/airflow/issues/10790#issuecomment-1012934214 We face the same issue with tasks that stay indefinitely in a queued status, except that we don't see tasks as `up_for_retry`. It happens randomly within our DAGs. The task will stay in a queued status forever until we manually make it fail. We **don't use any sensors** at all. We are on an AWS MWAA instance (Airflow 2.0.2). Example logs: Scheduler: ``` [2022-01-14 08:03:32,868] {{scheduler_job.py:1239}} ERROR - Executor reports task instance <TaskInstance: task0 2022-01-13 07:00:00+00:00 [queued]> finished (failed) although the task says its queued. (Info: None) Was the task killed externally? [2022-01-14 08:03:32,845] {{scheduler_job.py:1210}} INFO - Executor reports execution of task0 execution_date=2022-01-13 07:00:00+00:00 exited with status failed for try_number 1 <TaskInstance: task0 2022-01-13 07:00:00+00:00 [queued]> in state FAILURE ``` Worker: ``` [2021-04-20 20:54:29,109: ERROR/ForkPoolWorker-15] Failed to execute task dag_id could not be found: task0. Either the dag did not exist or it failed to parse..` This is not seen in the worker logs for every occurrence in the scheduler logs. ``` Because of the MWAA autoscaling mechanism, `worker_concurrency` is not configurable. `worker_autoscale`: `10, 10`. `dagbag_import_timeout`: 120s `dag_file_processor_timeout`: 50s `parallelism` = 48 `dag_concurrency` = 10000 `max_threads` = 8 We currently have 2 (minWorkers) to 10 (maxWorkers) mw1.medium (2 vCPU) workers. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Thousand os Executor reports task instance X finished (success) although the > task says its queued. Was the task killed externally? > ---------------------------------------------------------------------------------------------------------------------------------- > > Key: AIRFLOW-5071 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5071 > Project: Apache Airflow > Issue Type: Bug > Components: DAG, scheduler > Affects Versions: 1.10.3 > Reporter: msempere > Priority: Critical > Fix For: 1.10.12 > > Attachments: image-2020-01-27-18-10-29-124.png, > image-2020-07-08-07-58-42-972.png > > > I'm opening this issue because since I update to 1.10.3 I'm seeing thousands > of daily messages like the following in the logs: > > ``` > {{__init__.py:1580}} ERROR - Executor reports task instance <TaskInstance: X > 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > {{jobs.py:1484}} ERROR - Executor reports task instance <TaskInstance: X > 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says > its queued. Was the task killed externally? > ``` > -And looks like this is triggering also thousand of daily emails because the > flag to send email in case of failure is set to True.- > I have Airflow setup to use Celery and Redis as a backend queue service. -- This message was sent by Atlassian Jira (v8.20.1#820001)