Uri Shamay created AIRFLOW-928:
----------------------------------

             Summary: Same task,execution_date run multiple times in worker 
when using CeleryExecutor
                 Key: AIRFLOW-928
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-928
             Project: Apache Airflow
          Issue Type: Bug
          Components: celery
    Affects Versions: Airflow 1.7.1.3
         Environment: Docker
            Reporter: Uri Shamay
         Attachments: airflow.log, worker.log

Hi,

When using with Airflow with CeleryExecutor, both RabbitMQ && Redis I tested, I 
see that when workers are down, the scheduler run each period of time 
**append** to the same key of {task,execution_date} in the broker, the same 
{task,execution_date}, what means is that if workers are down/can't connect to 
broker for few hours, I got in the broker thousands of same executions.
In my scenario I have just one dummy dag to run with dag_concurrency of 4,
I expected in that scenario that broker will hold just 4 messages, and the 
scheduler shouldn't queuing another and another and another for same {task, 
execution_date}

What happened is that when workers start to consume messages, they got 
thousands of tasks for just 4 tasks, and when they trying to write to database 
for task_instances - there are errors of integrity while such 
{task,execution_date} already exist.

Attached 2 files:

1. airflow.log - this is the task log, you can see that few instances processes 
of same {task,execution_date} write to the same log file.

2. worker.log - this is the worker log, you can see that worker trying to run 
same {task,execution_date} multiple times + the errors from the database 
integrity that said that those tasks on those dates already exists.

Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to