[
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17164361#comment-17164361
]
Ignacio Peluffo commented on AIRFLOW-5071:
------------------------------------------
[~turbaszek]
* `max_active_runs_per_dag = 1`
* `dag_concurrency = 16` (default value), for DAG reporting the error
`concurrency = 6`
* `parallelism = 32`
* Are you using CeleryExecutor? Yes
Other notes that may help:
* Sensor sometimes runs for more than hour when it reports the error
Sensor setup:
{code:java}
ExternalTaskSensor(
dag=dag,
task_id='xxxxx.sensor',
external_dag_id='xxxxx',
external_task_id=None,
execution_date_fn=xxxxx,
timeout=2 * 60 * 60, # 2 hours
mode='reschedule',
sla=DAG_SLA,
email=SLA_ALERT_EMAIL,
)
{code}
Logs:
{code:java}
*** Reading remote log from s3://.../1.log.
[2020-07-24 03:00:23,650] {taskinstance.py:655} INFO - Dependencies all met for
<TaskInstance: xxxxx.sensor 2020-07-23T03:00:00+00:00 [queued]>
[2020-07-24 03:00:23,674] {taskinstance.py:655} INFO - Dependencies all met for
<TaskInstance: xxxxx.sensor 2020-07-23T03:00:00+00:00 [queued]>
[2020-07-24 03:00:23,674] {taskinstance.py:866} INFO -
--------------------------------------------------------------------------------
[2020-07-24 03:00:23,674] {taskinstance.py:867} INFO - Starting attempt 1 of 4
[2020-07-24 03:00:23,675] {taskinstance.py:868} INFO -
--------------------------------------------------------------------------------
[2020-07-24 03:00:23,699] {taskinstance.py:887} INFO - Executing
<Task(ExternalTaskSensor): xxxxx.sensor> on 2020-07-23T03:00:00+00:00
[2020-07-24 03:00:23,703] {standard_task_runner.py:53} INFO - Started process
8363 to run task
[2020-07-24 03:00:23,856] {logging_mixin.py:112} INFO - Running %s on host %s
<TaskInstance: xxxxx.sensor 2020-07-23T03:00:00+00:00 [running]> xxxxx
[2020-07-24 03:00:23,914] {external_task_sensor.py:92} INFO - Poking for
xxxxx.None on 2020-07-23T02:00:00+00:00 ...
[2020-07-24 03:00:23,968] {taskinstance.py:1119} INFO - Rescheduling task,
marking task as UP_FOR_RESCHEDULE[2020-07-24 03:01:55,563]
{taskinstance.py:655} INFO - Dependencies all met for <TaskInstance:
xxxxx.sensor 2020-07-23T03:00:00+00:00 [queued]>
[2020-07-24 03:01:55,595] {taskinstance.py:655} INFO - Dependencies all met for
<TaskInstance: xxxxx.sensor 2020-07-23T03:00:00+00:00 [queued]>
[2020-07-24 03:01:55,596] {taskinstance.py:866} INFO -
--------------------------------------------------------------------------------
[2020-07-24 03:01:55,596] {taskinstance.py:867} INFO - Starting attempt 1 of 4
[2020-07-24 03:01:55,596] {taskinstance.py:868} INFO -
--------------------------------------------------------------------------------
[2020-07-24 03:01:55,619] {taskinstance.py:887} INFO - Executing
<Task(ExternalTaskSensor): xxxxx.sensor> on 2020-07-23T03:00:00+00:00
[2020-07-24 03:01:55,624] {standard_task_runner.py:53} INFO - Started process
5258 to run task
[2020-07-24 03:01:55,746] {logging_mixin.py:112} INFO - Running %s on host %s
<TaskInstance: xxxxx.sensor 2020-07-23T03:00:00+00:00 [running]> xxxxx
[2020-07-24 03:01:55,814] {external_task_sensor.py:92} INFO - Poking for
xxxxx.None on 2020-07-23T02:00:00+00:00 ...
[2020-07-24 03:01:55,863] {taskinstance.py:1119} INFO - Rescheduling task,
marking task as UP_FOR_RESCHEDULE[2020-07-24 03:03:34,911]
{taskinstance.py:649} INFO - Dependencies not met for <TaskInstance:
xxxxx.sensor 2020-07-23T03:00:00+00:00 [up_for_retry]>, dependency 'Not In
Retry Period' FAILED: Task is not ready for retry yet but will be retried
automatically. Current date is 2020-07-24T03:03:34.911000+00:00 and task will
be retried at 2020-07-24T03:08:24.850228+00:00.[2020-07-24 03:09:07,903]
{taskinstance.py:655} INFO - Dependencies all met for <TaskInstance:
xxxxx.sensor 2020-07-23T03:00:00+00:00 [queued]>
[2020-07-24 03:09:07,927] {taskinstance.py:655} INFO - Dependencies all met for
<TaskInstance: xxxxx.sensor 2020-07-23T03:00:00+00:00 [queued]>
[2020-07-24 03:09:07,927] {taskinstance.py:866} INFO -
....
{code}
After approximately 1 hour and 13 minutes of similar logs ...
{code:java}
--------------------------------------------------------------------------------
[2020-07-24 04:13:40,840] {taskinstance.py:867} INFO - Starting attempt 1 of 4
[2020-07-24 04:13:40,840] {taskinstance.py:868} INFO -
--------------------------------------------------------------------------------
[2020-07-24 04:13:40,866] {taskinstance.py:887} INFO - Executing
<Task(ExternalTaskSensor): xxxxx.sensor> on 2020-07-23T03:00:00+00:00
[2020-07-24 04:13:40,872] {standard_task_runner.py:53} INFO - Started process
11438 to run task
[2020-07-24 04:13:41,002] {logging_mixin.py:112} INFO - Running %s on host %s
<TaskInstance: xxxxx.sensor 2020-07-23T03:00:00+00:00 [running]> xxxxx
[2020-07-24 04:13:41,054] {external_task_sensor.py:92} INFO - Poking for
xxxxx.None on 2020-07-23T02:00:00+00:00 ...
[2020-07-24 04:13:41,070] {base_sensor_operator.py:123} INFO - Success criteria
met. Exiting.
[2020-07-24 04:13:41,085] {taskinstance.py:1048} INFO - Marking task as
SUCCESS.dag_id=xxxxx, task_id=xxxxx.sensor, execution_date=20200723T030000,
start_date=20200724T030023, end_date=20200724T041341
{code}
> Thousand os Executor reports task instance X finished (success) although the
> task says its queued. Was the task killed externally?
> ----------------------------------------------------------------------------------------------------------------------------------
>
> Key: AIRFLOW-5071
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
> Project: Apache Airflow
> Issue Type: Bug
> Components: DAG, scheduler
> Affects Versions: 1.10.3
> Reporter: msempere
> Priority: Critical
> Fix For: 1.10.12
>
> Attachments: image-2020-01-27-18-10-29-124.png,
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands
> of daily messages like the following in the logs:
>
> ```
> {{__init__.py:1580}} ERROR - Executor reports task instance <TaskInstance: X
> 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance <TaskInstance: X
> 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)