[ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17191995#comment-17191995
 ] 

ASF GitHub Bot commented on AIRFLOW-5071:
-----------------------------------------

dmariassy opened a new issue #10790:
URL: https://github.com/apache/airflow/issues/10790


   **Apache Airflow version**: 1.10.9
   
   
   **Kubernetes version (if you are using kubernetes)** (use `kubectl 
version`): Server: v1.10.13, Client: v1.17.0
   
   **Environment**:
   
   - **Cloud provider or hardware configuration**: AWS
   - **OS** (e.g. from /etc/os-release): Debian GNU/Linux 9 (stretch)
   - **Kernel** (e.g. `uname -a`): `Linux airflow-web-54fc4fb694-ftkp5 
4.19.123-coreos #1 SMP Fri May 22 19:21:11 -00 2020 x86_64 GNU/Linux`
   - **Others**: Redis, CeleryExecutor
   
   **What happened**:
   
   In line with the guidelines laid out in 
[AIRFLOW-7120](https://issues.apache.org/jira/browse/AIRFLOW-7120), I'm copying 
over a JIRA for a bug that has significant negative impact on our pipeline 
SLAs. The original ticket is 
[AIRFLOW-5071](https://issues.apache.org/jira/browse/AIRFLOW-5071) which has a 
lot of details from various users who use ExternalTaskSensors in reschedule 
mode and see their tasks going through the following unexpected state 
transitions:
   
   running -> up_for_reschedule -> scheduled -> queued -> **up_for_retry**
   
   In our case, this issue seems to affect approximately ~2000 tasks per day.
   
   <img width="1225" alt="Screenshot 2020-09-08 at 09 01 03" 
src="https://user-images.githubusercontent.com/9336831/92443593-02b35000-f1b2-11ea-9249-420b6e656b49.png";>
   
   **What you expected to happen**:
   
   I would expect that tasks would go through the following state transitions 
instead: running -> up_for_reschedule -> scheduled -> queued -> **running**
   
   **How to reproduce it**:
   
   Unfortunately, I don't have configuration available that could be used to 
easily reproduce the issue at the moment. However, based on the thread in 
AIRFLOW-5071, the problem seems to arise in deployments that use a large number 
of sensors in reschedule mode.
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-5071
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: DAG, scheduler
>    Affects Versions: 1.10.3
>            Reporter: msempere
>            Priority: Critical
>             Fix For: 1.10.12
>
>         Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance <TaskInstance: X 
> 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance <TaskInstance: X 
> 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to