Is it possible that the sql you're running to get customer ids is not the
same every time? That's what I (loosely) meant by non-deterministic.

The error message suggests that Airflow is sending out an instruction to
run a DAG called "Pipeline_DEVTEST_CDB_DEVTEST_00_B10C8DBE1CFA89C1F274B"
which it expects to find in pipeline.py. However, it is not finding any DAG
object by that name. That's why I'm wondering if your code is always
generating the exact same DAGs.

For example, if customer "DEVTEST_CDB_DEVTEST_00_B10C8DBE1CFA89C1F274B" had
been deleted from your database, then Airflow might send out a command to
run that DAG immediately before the deletion and be unable to load it the
next time it parses the file.

In general, Airflow assumes that DAGs are static and unchanging (or slowly
changing at best). If you want to have parameterized (per-customer)
workflows, it might be best to create a single DAG/tasks that define your
general workflow (for example, "retrieve customer i information" ->
"process customer i information" -> "store customer i information"). That
way your DAG and tasks remain stable. Perhaps someone else on the list
could share an effective pattern along those lines.

Now, if that isn't your situation, something strange is happening that's
preventing Airflow from locating the DAG.

On Fri, Apr 29, 2016 at 6:49 PM harish singh <[email protected]>
wrote:

> *"However a DAG with such a complicated name isn't referenced in the
> examplecode (just "Pipeline" + i). My guess is that the DAG id is being
> generatedin a non-deterministic or time-based way, and therefore the run
> commandcan't find it once the generation criteria change. But hard to say
> withoutmore detail."*
> [response]  I am not sure what you mean by non-deterministic way.
> We are dynamically creating DAGs (1 Dag per customer) . So if there are 100
> customers with itds 1, 2,3 .... 100, there will be 100 pipelines/dags with
> names:
>
> *Pipeline_1, Pipeline_2, Pipeline_3 ...... Pipeline_100*
>
> The customer names I get from our database table which stores this
> information.
>
> so the flow is:
>
> customer_id_list ->  getCustomerIds( //run some sql )
> for each customerId in  customer_id_list:
>   dag = DAG("Pipeline_"+ customerId, default_args=default_args,
> *schedule_interval= **datetime.timedelta(minutes=60)*)
>
>
> Let me know if that helps(or confuses :) ) you in understanding the flow.
> If there is some thing wrong in what I am doing, I would love to know what
> is it. This seems to be a serious issue (if it is) esp while running
> backfill.
>
>
>
> On Fri, Apr 29, 2016 at 12:19 PM, Jeremiah Lowin <[email protected]> wrote:
>
> > That error message usually means that an error took place inside Airflow
> > before the task ran -- maybe something with setting up the task? The
> task's
> > state is NONE, meaning it never even started, but the executor is
> reporting
> > that it successfully sent the command to start the task (SUCCESS)... the
> > culprit is some failure in between.
> >
> > The error message seems to say that the DAG itself couldn't be loaded
> from
> > the .py file:
> >
> > airflow.utils.AirflowException: DAG
> > [Pipeline_DEVTEST_CDB_DEVTEST_00_B10C8DBE1CFA89C1F274B]
> > could not be found in /usr/local/airflow/dags/pipeline.py
> >
> > However a DAG with such a complicated name isn't referenced in the
> example
> > code (just "Pipeline" + i). My guess is that the DAG id is being
> generated
> > in a non-deterministic or time-based way, and therefore the run command
> > can't find it once the generation criteria change. But hard to say
> without
> > more detail.
> >
> >
> >
> > On Fri, Apr 29, 2016 at 3:11 PM Bolke de Bruin <[email protected]>
> wrote:
> >
> > > I would really like to know what the use case is for a depends_on_past
> on
> > > the *task* level.  What past are you trying to depend on?
> > >
> > > What I am currently assuming from just reading the example and replying
> > on
> > > my phone is that the depends_on_past prevents execution. Have t4 and t5
> > > ever run?
> > >
> > > Bolke
> > >
> > > Sent from my iPhone
> > >
> > > > On 29 apr. 2016, at 21:01, Chris Riccomini <[email protected]>
> > > wrote:
> > > >
> > > > @Bolke/@Jeremiah, do you guys think this is related? Full thread is
> > here:
> > > >
> > https://groups.google.com/forum/?pli=1#!topic/airbnb_airflow/y7wt3I24Rmw
> > > >
> > > >> On Fri, Apr 29, 2016 at 11:57 AM, Chris Riccomini <[email protected]
> >
> > > wrote:
> > > >>
> > > >> Please subscribe to the dev@ mailing list. Sorry to make you jump
> > > through
> > > >> hoops--I know it's annoying--but it's for a good cause. ;)
> > > >>
> > > >> This looks like a bug. I'm wondering if it's related to
> > > >> https://issues.apache.org/jira/browse/AIRFLOW-20. Perhaps the
> > backfill
> > > is
> > > >> causing a mis-alignment between the dag runs, and depends_on_past
> > logic
> > > >> isn't seeing the prior execution?
> > > >>
> > >
> >
>

Reply via email to