[ https://issues.apache.org/jira/browse/AIRFLOW-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16640110#comment-16640110 ]
Stefan Seelmann commented on AIRFLOW-2001: ------------------------------------------ Yes, this issue requests what was implemented in AIRFLOW-2747. [~Fokko] could you please also close this one? > Make sensors relinquish their execution slots > --------------------------------------------- > > Key: AIRFLOW-2001 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2001 > Project: Apache Airflow > Issue Type: Bug > Components: db, scheduler > Reporter: Yati > Assignee: Yati > Priority: Major > > A sensor task instance should not take up an execution slot for the entirety > of its lifetime (as is currently the case). Indeed, for reasons outlined > below, it would be better if sensor execution was preempted by the scheduler > by parking it away from the slot till the next poll. > Some sensors sense for a condition to be true which is affected only by an > external party (e.g., materialization by external means of certain rows in a > table). By external, I mean external to the Airflow installation in question, > such that the producing entity itself does not need an execution slot in an > Airflow pool. If all sensors and their dependencies were of this nature, > there would be no issue. Unfortunately, a lot of real world DAGs have sensor > dependencies on results produced by another task, typically in some other > DAG, but scheduled by the same Airflow scheduler. > Consider a simple example (arrow direction represents "must happen before", > just like in Airflow): DAG1(a >> b) and DAG2(c:sensor(DAG1.b) >> d). In other > words, The opening task c of the second dag has a sensor dependency on the > ending task b of the first dag. Imagine we have a single pool with 10 > execution slots, and somehow task instances for c fill up the pool, while the > corresponding task instances of DAG1.b have not had a chance to execute (in > the real world this happens because of, say, back-fills or reprocesses by > clearing those sensors instances and their upstream). This is a deadlock > situation, since no progress can be made here – the sensors have filled up > the pool waiting on tasks that themselves will never get a chance to run. > This problem has been [acknowledged > here|https://cwiki.apache.org/confluence/display/AIRFLOW/Common+Pitfalls] > One way (suggested by Fokko) to solve this is to always run sensors on their > pool, and to be careful with the concurrency settings of sensor tasks. This > is what a lot of users do now, but there are better solutions to this. Since > all the sensor interface allows for is a poll, we can, after each poll, > "park" the sensor's execution slot and yield it to other tasks. In the above > scenario, there would be no "filling up" of the pool by sensors tasks, as > they will be polled, determined to be still unfulfilled, and then parked > away, thereby giving a chance to other tasks. > This would likely have some changes to the DB, and of course to the scheduler. -- This message was sent by Atlassian JIRA (v7.6.3#76005)