[ https://issues.apache.org/jira/browse/AIRFLOW-470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15566196#comment-15566196 ]
Laura Lorenz commented on AIRFLOW-470: -------------------------------------- Dup of AIRFLOW-471? > Frequent multiple dispatching of the same task to celery > -------------------------------------------------------- > > Key: AIRFLOW-470 > URL: https://issues.apache.org/jira/browse/AIRFLOW-470 > Project: Apache Airflow > Issue Type: Bug > Components: celery, scheduler > Affects Versions: Airflow 1.7.1.3 > Reporter: Jasmine Tsai > Priority: Critical > > We are seeing a lot of frequent dispatching of the same task to celery within > a very short time frame (same task instance by Airflow conditions, but a > different celery task uuid), which is causing a lot of unexpected behavior > for us. Most of these are annoying but harmless — sometimes they clear xcom > data and overwrite logs, but for the most part they are able to rely on the > db metadata and not try to run itself multiple times. We are seeing this > behavior frequent, some tasks are getting scheduled 5 times within the span > of two minutes. The issue seems to be exacerbated by the use of pools. > We have even seen the same task being dispatched twice within a second apart, > causing real race conditions because the second try didn't see the task > instance starting to run yet in the metadata db. > It seems from other issues submitted here that people definitely see problems > with the same tasks running multiple times, but this problem seems to be > getting worse for us. Is it a known issue for the multiple dispatching to be > so frequent/severe? (or maybe even the intentional design/side effect?) Are > there things that we could be doing that might make this worse? (One of our > primary suspect is the scheduler, which we have set its num_runs to 1) -- This message was sent by Atlassian JIRA (v6.3.4#6332)