One reason to use Pools is because you have tasks in different DAGs that all use the same resource, like a database. A Pool lets you say, "I will send no more than 3 requests to this database at once". However, there are bugs in the scheduler and it is possible to have many active tasks overscheduled against a pool.
You can create a pool in the Admin->Pools drop-down. You don't need a script. On Mon, Jun 20, 2016 at 2:46 PM, harish singh <[email protected]> wrote: > Hi, > > We have been using airflow for few 3 months now. > > One pain I felt was, during backfill if I have 2 tasks t1 and t2 - with t1 > having depends_on_past=true, > t0 -> t1 > t0 -> t2 > > I find that the task t2 with no past dependency keeps getting scheduled. > This causes the task t1 to wait for a long time before it gets scheduled. > > I think this is a good use case for creating "pools" and allocate slots for > each pool. > Also, I will have to use priority_weights. And adjust parallelism!!! > > Is there a better way to handle this? > > > Also, in general, are there any examples on how to use pools? > > I peeked into* airflow/tests/operators/subdag_operator.py *and found the > below snippet: > > session = airflow.settings.Session() > pool_1 = airflow.models.Pool(pool='test_pool_1', slots=1) > session.add(pool_1) > session.commit() > > Why do we need Session instance? Do we need to run the below code before > creating a pool in code (inside my pipeline.py under dags/ directory): > > *pool = ( > session.query(Pool) > .filter(Pool.pool == 'AIRFLOW-205') > .first()) > if not pool: > session.add(Pool(pool='AIRFLOW-205', slots=8)) > session.commit()* > > > Also, I saw few places where pool: 'backfill' is used? > > Is 'backfill' a special pre-defined pool? > > > If not, how do we create different types of pools based on whether it > is backfill or not? > > > All this is being done in pipeline.py script under 'dags/' directory. > > > Thanks, > Harish > -- Lance Norskog [email protected] Redwood City, CA
