[
https://issues.apache.org/jira/browse/AIRFLOW-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17153251#comment-17153251
]
Szymon Grzemski commented on AIRFLOW-5071:
------------------------------------------
In my case the "killed externally" message was seen right after Executor
reported success of a task, even though it was just queued and the task was
still in a running state.
{code:bash}
[2020-07-06 13:24:46,459] {base_executor.py:58} INFO - Adding to queue:
['airflow', 'run', 'dedupe-emr-job-flow', 'check_signals_table',
'2020-07-06T12:00:00+00:00', '--local', '--pool', 'default_pool', '-sd',
'/var/lib/airflow/dags/etl-airflow-dags/dedupe_emr_job_flow.py']
[2020-07-06 13:24:48,181] {scheduler_job.py:1311} INFO - Executor reports
execution of dedupe-emr-job-flow.check_signals_table execution_date=2020-07-06
12:00:00+00:00 exited with status success for try_number 1
[2020-07-06 13:24:48,189] {scheduler_job.py:1328} ERROR - Executor reports task
instance <TaskInstance: dedupe-emr-job-flow.check_signals_table 2020-07-06
12:00:00+00:00 [queued]> finished (success) although the task says its queued.
Was the task killed externally?
{code}
It was confirmed as a success after 2s, but in the Flower I could see it was
running till 13:25:06...
My WORKAROUND:
I've did a LOT of debugging of this issue, because the issue was causing an
impact at our core pipeline. After analysing the behaviour in the debug mode,
I've checked Dag Processor's logs and it occured that DagBag refresh takes a
little bit over 10s for the longest dag. As a result, I set:
{code:bash}
min_file_process_interval = 15
{code}
under the [scheduler] section and restarted scheduler. I've stopped
experiencing this error for almost 24h now:
!image-2020-07-08-07-58-42-972.png|width=543,height=218!
Maybe it could be a hint for [~kaxilnaik], [~potiuk] or other developers to fix
this issue :)
> Thousand os Executor reports task instance X finished (success) although the
> task says its queued. Was the task killed externally?
> ----------------------------------------------------------------------------------------------------------------------------------
>
> Key: AIRFLOW-5071
> URL: https://issues.apache.org/jira/browse/AIRFLOW-5071
> Project: Apache Airflow
> Issue Type: Bug
> Components: DAG, scheduler
> Affects Versions: 1.10.3
> Reporter: msempere
> Priority: Critical
> Attachments: image-2020-01-27-18-10-29-124.png,
> image-2020-07-08-07-58-42-972.png
>
>
> I'm opening this issue because since I update to 1.10.3 I'm seeing thousands
> of daily messages like the following in the logs:
>
> ```
> {{__init__.py:1580}} ERROR - Executor reports task instance <TaskInstance: X
> 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says
> its queued. Was the task killed externally?
> {{jobs.py:1484}} ERROR - Executor reports task instance <TaskInstance: X
> 2019-07-29 00:00:00+00:00 [queued]> finished (success) although the task says
> its queued. Was the task killed externally?
> ```
> -And looks like this is triggering also thousand of daily emails because the
> flag to send email in case of failure is set to True.-
> I have Airflow setup to use Celery and Redis as a backend queue service.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)