[
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193516#comment-17193516
]
ASF GitHub Bot commented on AIRFLOW-5071:
-----------------------------------------
yuqian90 commented on issue #10790:
URL: https://github.com/apache/airflow/issues/10790#issuecomment-690136389
Hi @turbaszek, any finding on this? We have a CeleryExecutor + Redis setup
with three workers (apache-airflow 1.10.12). The airflow-scheduler log has a
lot of lines like this. I remember this was already a problem when we were
using older versions such as 1.10.10. It's just we never paid much attention to
it.
```
{taskinstance.py:1150} ERROR - Executor reports task instance <TaskInstance:
... [queued]> finished (success) although the task says its queued. Was the
task killed externally?
```
Same with others in this thread, we have a lot of sensors in "reschedule"
mode with `poke_interval` set to 60s. These are the ones that most often hit
this error. So far our workaround has been to add a `retries=3` to these
sensors. That way when this error happens it retries and we don't get any spam.
This is definitely not a great long term solution though. Such sensors go into
`up_for_retry` state when this happen.
I also tried to tweak these parameters. They don't seem to matter much as
far as this error is concerned:
```
parallelism = 1024
dag_concurrency = 128
max_threads = 8
```
The way to reproduce this issue seems to be to create a DAG with a bunch of
parallel `reschedule` sensors. And make the DAG slow to import. For example,
like this. If we add a `time.sleep(30)` at the end to simulate the experience
of slow-to-import DAGs, this error happens a lot for such sensors. You may also
need to tweak the `dagbag_import_timeout` and `dag_file_processor_timeout` if
adding the `sleep` causes dags to fail to import altogether.
When the scheduler starts to process this DAG, we then start to see the
above error happening to these sensors. And the go into `up_for_retry`.
```python
import datetime
import pendulum
import time
from airflow.models.dag import DAG
from airflow.contrib.sensors.python_sensor import PythonSensor
with DAG(
dag_id="test_dag_slow",
start_date=datetime.datetime(2020, 9, 8),
schedule_interval="@daily",
) as dag:
sensors = [
PythonSensor(
task_id=f"sensor_{i}",
python_callable=lambda: False,
mode="reschedule",
retries=2,
) for i in range(20)
]
time.sleep(30)
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> Thousand os Executor reports task instance X finished (success) although the
> task says its queued. Was the task killed externally?
> ----------------------------------------------------------------------------------------------------------------------------------
>
> Key: AIRFLOW-5071
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
> Project: Apache Airflow
> Issue Type: Bug
> Components: DAG, scheduler
> Affects Versions: 1.10.3
> Reporter: msempere
> Priority: Critical
> Fix For: 1.10.12
>
> Attachments: image-2020-01-27-18-10-29-124.png,
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands
> of daily messages like the following in the logs:
>
> ```
> {{__init__.py:1580}} ERROR - Executor reports task instance <TaskInstance: X
> 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance <TaskInstance: X
> 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)