At the moment by default backfill does not use a pool but you can specify
one with --pool.

On Mon, Jun 20, 2016 at 9:02 PM, Chris Riccomini <[email protected]>
wrote:

> Hey Harish,
>
> One thing that I'm not clear on is whether backfill even honors pools at
> all. I believe backfill currently starts its own scheduler outside of the
> main scheduler process. As a result, I think the pools are completely
> disregarded. Bolke/Jeremiah/Paul can correct me if I'm wrong.
>
> Cheers,
> Chris
>
> On Mon, Jun 20, 2016 at 7:46 PM, Lance Norskog <[email protected]>
> wrote:
>
> > One reason to use Pools is because you have tasks in different DAGs that
> > all use the same resource, like a database. A Pool lets you say, "I will
> > send no more than 3 requests to this database at once". However, there
> are
> > bugs in the scheduler and it is possible to have many active tasks
> > overscheduled against a pool.
> >
> > You can create a pool in the Admin->Pools drop-down. You don't need a
> > script.
> >
> > On Mon, Jun 20, 2016 at 2:46 PM, harish singh <[email protected]>
> > wrote:
> >
> > > Hi,
> > >
> > > We have been using airflow for few 3 months now.
> > >
> > > One pain I felt was, during backfill if I have 2 tasks t1 and t2 - with
> > t1
> > > having depends_on_past=true,
> > >               t0 -> t1
> > >               t0 -> t2
> > >
> > > I find that the task t2 with no past dependency keeps getting
> scheduled.
> > > This causes the task t1 to wait for a long time before it gets
> scheduled.
> > >
> > > I think this is a good use case for creating "pools" and allocate slots
> > for
> > > each pool.
> > > Also, I will have to use priority_weights.  And adjust parallelism!!!
> > >
> > > Is there a better way to handle this?
> > >
> > >
> > > Also, in general, are there any examples on how to use pools?
> > >
> > > I peeked into* airflow/tests/operators/subdag_operator.py *and found
> the
> > > below snippet:
> > >
> > > session = airflow.settings.Session()
> > > pool_1 = airflow.models.Pool(pool='test_pool_1', slots=1)
> > > session.add(pool_1)
> > > session.commit()
> > >
> > > Why do we need Session instance? Do we need to run the below code
> before
> > > creating a pool in code (inside my pipeline.py under dags/ directory):
> > >
> > > *pool = (
> > >     session.query(Pool)
> > >     .filter(Pool.pool == 'AIRFLOW-205')
> > >     .first())
> > > if not pool:
> > >     session.add(Pool(pool='AIRFLOW-205', slots=8))
> > >     session.commit()*
> > >
> > >
> > > Also, I saw few places where pool: 'backfill'  is used?
> > >
> > > Is 'backfill' a special pre-defined pool?
> > >
> > >
> > > If not, how do we create different types of pools based on whether it
> > > is backfill or not?
> > >
> > >
> > > All this is being done in pipeline.py script under 'dags/' directory.
> > >
> > >
> > > Thanks,
> > > Harish
> > >
> >
> >
> >
> > --
> > Lance Norskog
> > [email protected]
> > Redwood City, CA
> >
>

Reply via email to