Hey Harish,

One thing that I'm not clear on is whether backfill even honors pools at
all. I believe backfill currently starts its own scheduler outside of the
main scheduler process. As a result, I think the pools are completely
disregarded. Bolke/Jeremiah/Paul can correct me if I'm wrong.

Cheers,
Chris

On Mon, Jun 20, 2016 at 7:46 PM, Lance Norskog <[email protected]>
wrote:

> One reason to use Pools is because you have tasks in different DAGs that
> all use the same resource, like a database. A Pool lets you say, "I will
> send no more than 3 requests to this database at once". However, there are
> bugs in the scheduler and it is possible to have many active tasks
> overscheduled against a pool.
>
> You can create a pool in the Admin->Pools drop-down. You don't need a
> script.
>
> On Mon, Jun 20, 2016 at 2:46 PM, harish singh <[email protected]>
> wrote:
>
> > Hi,
> >
> > We have been using airflow for few 3 months now.
> >
> > One pain I felt was, during backfill if I have 2 tasks t1 and t2 - with
> t1
> > having depends_on_past=true,
> >               t0 -> t1
> >               t0 -> t2
> >
> > I find that the task t2 with no past dependency keeps getting scheduled.
> > This causes the task t1 to wait for a long time before it gets scheduled.
> >
> > I think this is a good use case for creating "pools" and allocate slots
> for
> > each pool.
> > Also, I will have to use priority_weights.  And adjust parallelism!!!
> >
> > Is there a better way to handle this?
> >
> >
> > Also, in general, are there any examples on how to use pools?
> >
> > I peeked into* airflow/tests/operators/subdag_operator.py *and found the
> > below snippet:
> >
> > session = airflow.settings.Session()
> > pool_1 = airflow.models.Pool(pool='test_pool_1', slots=1)
> > session.add(pool_1)
> > session.commit()
> >
> > Why do we need Session instance? Do we need to run the below code before
> > creating a pool in code (inside my pipeline.py under dags/ directory):
> >
> > *pool = (
> >     session.query(Pool)
> >     .filter(Pool.pool == 'AIRFLOW-205')
> >     .first())
> > if not pool:
> >     session.add(Pool(pool='AIRFLOW-205', slots=8))
> >     session.commit()*
> >
> >
> > Also, I saw few places where pool: 'backfill'  is used?
> >
> > Is 'backfill' a special pre-defined pool?
> >
> >
> > If not, how do we create different types of pools based on whether it
> > is backfill or not?
> >
> >
> > All this is being done in pipeline.py script under 'dags/' directory.
> >
> >
> > Thanks,
> > Harish
> >
>
>
>
> --
> Lance Norskog
> [email protected]
> Redwood City, CA
>

Reply via email to