[
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17476036#comment-17476036
]
ASF GitHub Bot commented on AIRFLOW-5071:
-----------------------------------------
val2k edited a comment on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-1012934214
We face the same issue with tasks that stay indefinitely in a queued status,
except that we don't see tasks as `up_for_retry`. It happens randomly within
our DAGs. The task will stay in a queued status forever until we manually make
it fail. We **don't use any sensors** at all. We are on an AWS MWAA instance
(Airflow 2.0.2).
Example logs:
Scheduler:
```
[2022-01-14 08:03:32,868] {{scheduler_job.py:1239}} ERROR - Executor reports
task instance <TaskInstance: task0 2022-01-13 07:00:00+00:00 [queued]> finished
(failed) although the task says its queued. (Info: None) Was the task killed
externally?
[2022-01-14 08:03:32,845] {{scheduler_job.py:1210}} INFO - Executor reports
execution of task0 execution_date=2022-01-13 07:00:00+00:00 exited with status
failed for try_number 1
<TaskInstance: task0 2022-01-13 07:00:00+00:00 [queued]> in state FAILURE
```
Worker:
```
[2021-04-20 20:54:29,109: ERROR/ForkPoolWorker-15] Failed to execute task
dag_id could not be found: task0. Either the dag did not exist or it failed to
parse..`
This is not seen in the worker logs for every occurrence in the scheduler
logs.
```
Because of the MWAA autoscaling mechanism, `worker_concurrency` is not
configurable.
`worker_autoscale`: `10, 10`.
`dagbag_import_timeout`: 120s
`dag_file_processor_timeout`: 50s
`parallelism` = 48
`dag_concurrency` = 10000
`max_threads` = 8
We currently have 2 (minWorkers) to 10 (maxWorkers) mw1.medium (2 vCPU)
workers.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
> Thousand os Executor reports task instance X finished (success) although the
> task says its queued. Was the task killed externally?
> ----------------------------------------------------------------------------------------------------------------------------------
>
> Key: AIRFLOW-5071
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
> Project: Apache Airflow
> Issue Type: Bug
> Components: DAG, scheduler
> Affects Versions: 1.10.3
> Reporter: msempere
> Priority: Critical
> Fix For: 1.10.12
>
> Attachments: image-2020-01-27-18-10-29-124.png,
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands
> of daily messages like the following in the logs:
>
> ```
> {{__init__.py:1580}} ERROR - Executor reports task instance <TaskInstance: X
> 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance <TaskInstance: X
> 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)