[ 
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164361#comment-17164361
 ] 

Ignacio Peluffo commented on AIRFLOW-5071:
------------------------------------------

[~turbaszek]
 * `max_active_runs_per_dag = 1`
 * `dag_concurrency = 16` (default value), for DAG reporting the error 
`concurrency = 6`
 * `parallelism = 32`
 * Are you using CeleryExecutor? Yes

Other notes that may help:
 * Sensor sometimes runs for more than hour when it reports the error

Sensor setup:

 
{code:java}
ExternalTaskSensor(
    dag=dag,
    task_id='xxxxx.sensor',
    external_dag_id='xxxxx',
    external_task_id=None,
    execution_date_fn=xxxxx,
    timeout=2 * 60 * 60,  # 2 hours
    mode='reschedule',
    sla=DAG_SLA,
    email=SLA_ALERT_EMAIL,
)
{code}
Logs:
{code:java}
*** Reading remote log from s3://.../1.log.
[2020-07-24 03:00:23,650] {taskinstance.py:655} INFO - Dependencies all met for 
<TaskInstance: xxxxx.sensor 2020-07-23T03:00:00+00:00 [queued]>
[2020-07-24 03:00:23,674] {taskinstance.py:655} INFO - Dependencies all met for 
<TaskInstance: xxxxx.sensor 2020-07-23T03:00:00+00:00 [queued]>
[2020-07-24 03:00:23,674] {taskinstance.py:866} INFO - 
--------------------------------------------------------------------------------
[2020-07-24 03:00:23,674] {taskinstance.py:867} INFO - Starting attempt 1 of 4
[2020-07-24 03:00:23,675] {taskinstance.py:868} INFO - 
--------------------------------------------------------------------------------
[2020-07-24 03:00:23,699] {taskinstance.py:887} INFO - Executing 
<Task(ExternalTaskSensor): xxxxx.sensor> on 2020-07-23T03:00:00+00:00
[2020-07-24 03:00:23,703] {standard_task_runner.py:53} INFO - Started process 
8363 to run task
[2020-07-24 03:00:23,856] {logging_mixin.py:112} INFO - Running %s on host %s 
<TaskInstance: xxxxx.sensor 2020-07-23T03:00:00+00:00 [running]> xxxxx
[2020-07-24 03:00:23,914] {external_task_sensor.py:92} INFO - Poking for 
xxxxx.None on 2020-07-23T02:00:00+00:00 ... 
[2020-07-24 03:00:23,968] {taskinstance.py:1119} INFO - Rescheduling task, 
marking task as UP_FOR_RESCHEDULE[2020-07-24 03:01:55,563] 
{taskinstance.py:655} INFO - Dependencies all met for <TaskInstance: 
xxxxx.sensor 2020-07-23T03:00:00+00:00 [queued]>
[2020-07-24 03:01:55,595] {taskinstance.py:655} INFO - Dependencies all met for 
<TaskInstance: xxxxx.sensor 2020-07-23T03:00:00+00:00 [queued]>
[2020-07-24 03:01:55,596] {taskinstance.py:866} INFO - 
--------------------------------------------------------------------------------
[2020-07-24 03:01:55,596] {taskinstance.py:867} INFO - Starting attempt 1 of 4
[2020-07-24 03:01:55,596] {taskinstance.py:868} INFO - 
--------------------------------------------------------------------------------
[2020-07-24 03:01:55,619] {taskinstance.py:887} INFO - Executing 
<Task(ExternalTaskSensor): xxxxx.sensor> on 2020-07-23T03:00:00+00:00
[2020-07-24 03:01:55,624] {standard_task_runner.py:53} INFO - Started process 
5258 to run task
[2020-07-24 03:01:55,746] {logging_mixin.py:112} INFO - Running %s on host %s 
<TaskInstance: xxxxx.sensor 2020-07-23T03:00:00+00:00 [running]> xxxxx
[2020-07-24 03:01:55,814] {external_task_sensor.py:92} INFO - Poking for 
xxxxx.None on 2020-07-23T02:00:00+00:00 ... 
[2020-07-24 03:01:55,863] {taskinstance.py:1119} INFO - Rescheduling task, 
marking task as UP_FOR_RESCHEDULE[2020-07-24 03:03:34,911] 
{taskinstance.py:649} INFO - Dependencies not met for <TaskInstance: 
xxxxx.sensor 2020-07-23T03:00:00+00:00 [up_for_retry]>, dependency 'Not In 
Retry Period' FAILED: Task is not ready for retry yet but will be retried 
automatically. Current date is 2020-07-24T03:03:34.911000+00:00 and task will 
be retried at 2020-07-24T03:08:24.850228+00:00.[2020-07-24 03:09:07,903] 
{taskinstance.py:655} INFO - Dependencies all met for <TaskInstance: 
xxxxx.sensor 2020-07-23T03:00:00+00:00 [queued]>
[2020-07-24 03:09:07,927] {taskinstance.py:655} INFO - Dependencies all met for 
<TaskInstance: xxxxx.sensor 2020-07-23T03:00:00+00:00 [queued]>
[2020-07-24 03:09:07,927] {taskinstance.py:866} INFO - 
....
{code}
After approximately 1 hour and 13 minutes of similar logs ...
{code:java}
--------------------------------------------------------------------------------
[2020-07-24 04:13:40,840] {taskinstance.py:867} INFO - Starting attempt 1 of 4
[2020-07-24 04:13:40,840] {taskinstance.py:868} INFO - 
--------------------------------------------------------------------------------
[2020-07-24 04:13:40,866] {taskinstance.py:887} INFO - Executing 
<Task(ExternalTaskSensor): xxxxx.sensor> on 2020-07-23T03:00:00+00:00
[2020-07-24 04:13:40,872] {standard_task_runner.py:53} INFO - Started process 
11438 to run task
[2020-07-24 04:13:41,002] {logging_mixin.py:112} INFO - Running %s on host %s 
<TaskInstance: xxxxx.sensor 2020-07-23T03:00:00+00:00 [running]> xxxxx
[2020-07-24 04:13:41,054] {external_task_sensor.py:92} INFO - Poking for 
xxxxx.None on 2020-07-23T02:00:00+00:00 ... 
[2020-07-24 04:13:41,070] {base_sensor_operator.py:123} INFO - Success criteria 
met. Exiting.
[2020-07-24 04:13:41,085] {taskinstance.py:1048} INFO - Marking task as 
SUCCESS.dag_id=xxxxx, task_id=xxxxx.sensor, execution_date=20200723T030000, 
start_date=20200724T030023, end_date=20200724T041341
{code}

> Thousand os Executor reports task instance X finished (success) although the 
> task says its queued. Was the task killed externally?
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-5071
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: DAG, scheduler
>    Affects Versions: 1.10.3
>            Reporter: msempere
>            Priority: Critical
>             Fix For: 1.10.12
>
>         Attachments: image-2020-01-27-18-10-29-124.png, 
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands 
> of daily messages like the following in the logs:
>  
> ```
>  {{__init__.py:1580}} ERROR - Executor reports task instance <TaskInstance: X 
> 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance <TaskInstance: X 
> 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says 
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the 
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to