[
https://issues.apache.org/jira/browse/AIRFLOW-462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bolke de Bruin updated AIRFLOW-462:
-----------------------------------
Issue Type: Wish (was: Bug)
> Concurrent Scheduler Jobs pushing the same task to queue
> --------------------------------------------------------
>
> Key: AIRFLOW-462
> URL: https://issues.apache.org/jira/browse/AIRFLOW-462
> Project: Apache Airflow
> Issue Type: Wish
> Components: scheduler
> Affects Versions: Airflow 1.7.0
> Reporter: Yogesh
>
> Hi,
> We are using airflow version 1.7.0 and we tried to implement high
> availability for airflow daemons in our production environment.
> Detailed high availability approach:
> - Airflow running on two different machines with all the
> daemons(webserver, scheduler, execueor)
> - Single mysql db repository pointed by two schedulers
> - Replicated dag files in both the machines
> - Running Single Rabbitmq Instance as message broker
> While doing so we came across below problem:
> - A particular task was sent to executor twice (two entries in message
> queue) by two different schedulers. But, we see only single entry for the
> task instance in database which is correct.
> We just checked out the code and found below fact:
> - before sending the task to executor it checks for task state in
> database and if its not already QUEUED it pushes that task to queue
> issue:
> As there is no locking implemented on the task instance in the database and
> both the Scheduler jobs are running so close that the second one might check
> for the status in the db before the first one updates that to QUEUED.
> We are not sure if in recent release this issue have been taken care of.
> Would you please help with some appropriate approach so that the high
> availability can be achieved.
> Thanks
> Yogesh
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)