[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?

ASF GitHub Bot (Jira) Fri, 14 Jan 2022 01:07:16 -0800


    [ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17476036#comment-17476036
 ]


ASF GitHub Bot commented on AIRFLOW-5071:
-----------------------------------------

val2k edited a comment on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-1012934214


   We face the same issue with tasks that stay indefinitely in a queued status, 
except that we don't see tasks as `up_for_retry`. It happens randomly within 
our DAGs. The task will stay in a queued status forever until we manually make 
it fail. We **don't use any sensors** at all. We are on an AWS MWAA instance 
(Airflow 2.0.2).
   
   Example logs:
   Scheduler:
   ```
   [2022-01-14 08:03:32,868] {{scheduler_job.py:1239}} ERROR - Executor reports 
task instance <TaskInstance: task0 2022-01-13 07:00:00+00:00 [queued]> finished 
(failed) although the task says its queued. (Info: None) Was the task killed 
externally?
   [2022-01-14 08:03:32,845] {{scheduler_job.py:1210}} INFO - Executor reports 
execution of task0 execution_date=2022-01-13 07:00:00+00:00 exited with status 
failed for try_number 1
   <TaskInstance: task0 2022-01-13 07:00:00+00:00 [queued]> in state FAILURE
   ```
   
   Worker:
   ```
   [2021-04-20 20:54:29,109: ERROR/ForkPoolWorker-15] Failed to execute task 
dag_id could not be found: task0. Either the dag did not exist or it failed to 
parse..`
   This is not seen in the worker logs for every occurrence in the scheduler 
logs.
   ```
   
   Because of the MWAA autoscaling mechanism, `worker_concurrency` is not 
configurable.
   `worker_autoscale`: `10, 10`.
   `dagbag_import_timeout`: 120s
   `dag_file_processor_timeout`: 50s
   `parallelism` = 48
   `dag_concurrency` = 10000
   `max_threads` = 8
   
   We currently have 2 (minWorkers) to 10 (maxWorkers) mw1.medium (2 vCPU) 
workers.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-5071
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: DAG, scheduler
>    Affects Versions: 1.10.3
>            Reporter: msempere
>            Priority: Critical
>             Fix For: 1.10.12
>
>         Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance <TaskInstance: X 
> 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance <TaskInstance: X 
> 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (AIRFLOW-5071) Thousand os Executor reports task instance X finished (success) although the task says its queued. Was the task killed externally?

Reply via email to