Is it possible that the sql you're running to get customer ids is not the
same every time? That's what I (loosely) meant by non-deterministic.
[response]
The sql is the same. But it is definitely possible that sometimes there is
always a possibility of things going wrong  within the method " getCustomerIds(
//run some sql ) ",  For example ( some kinds of  SqlException )


-- Now if the customer is deleted or lets say airflow cannot find that DAG:
even then, the pipeline should not stall ?? Its ok if the customer is not
found and there is some issue with that ONE dag. But why will the entire
pipeline get stuck?

For now, are DAGs are pretty much static. The customer list is also going
to remain the same. Irrespective of what pattern we choose, the pipeline
should not stop.

This is what my logs show. I monitored for about 45 minutes and no
progress.

*[2016-04-29 06:56:43,013] {jobs.py:813} INFO - [backfill progress]
waiting: 104 | succeeded: 112 | kicked_off: 100 | failed: 0 | wont_run: 0*
*[2016-04-29 06:56:48,010] {jobs.py:813} INFO - [backfill progress]
waiting: 104 | succeeded: 112 | kicked_off: 100 | failed: 0 | wont_run: 0*
*[2016-04-29 06:56:53,014] {jobs.py:813} INFO - [backfill progress]
waiting: 104 | succeeded: 112 | kicked_off: 100 | failed: 0 | wont_run: 0*
*[2016-04-29 06:56:58,010] {jobs.py:813} INFO - [backfill progress]
waiting: 104 | succeeded: 112 | kicked_off: 100 | failed: 0 | wont_run: 0*

On Sat, Apr 30, 2016 at 6:21 AM, Jeremiah Lowin <[email protected]> wrote:

> Is it possible that the sql you're running to get customer ids is not the
> same every time? That's what I (loosely) meant by non-deterministic.
>
> The error message suggests that Airflow is sending out an instruction to
> run a DAG called "Pipeline_DEVTEST_CDB_DEVTEST_00_B10C8DBE1CFA89C1F274B"
> which it expects to find in pipeline.py. However, it is not finding any DAG
> object by that name. That's why I'm wondering if your code is always
> generating the exact same DAGs.
>
> For example, if customer "DEVTEST_CDB_DEVTEST_00_B10C8DBE1CFA89C1F274B" had
> been deleted from your database, then Airflow might send out a command to
> run that DAG immediately before the deletion and be unable to load it the
> next time it parses the file.
>
> In general, Airflow assumes that DAGs are static and unchanging (or slowly
> changing at best). If you want to have parameterized (per-customer)
> workflows, it might be best to create a single DAG/tasks that define your
> general workflow (for example, "retrieve customer i information" ->
> "process customer i information" -> "store customer i information"). That
> way your DAG and tasks remain stable. Perhaps someone else on the list
> could share an effective pattern along those lines.
>
> Now, if that isn't your situation, something strange is happening that's
> preventing Airflow from locating the DAG.
>
> On Fri, Apr 29, 2016 at 6:49 PM harish singh <[email protected]>
> wrote:
>
> > *"However a DAG with such a complicated name isn't referenced in the
> > examplecode (just "Pipeline" + i). My guess is that the DAG id is being
> > generatedin a non-deterministic or time-based way, and therefore the run
> > commandcan't find it once the generation criteria change. But hard to say
> > withoutmore detail."*
> > [response]  I am not sure what you mean by non-deterministic way.
> > We are dynamically creating DAGs (1 Dag per customer) . So if there are
> 100
> > customers with itds 1, 2,3 .... 100, there will be 100 pipelines/dags
> with
> > names:
> >
> > *Pipeline_1, Pipeline_2, Pipeline_3 ...... Pipeline_100*
> >
> > The customer names I get from our database table which stores this
> > information.
> >
> > so the flow is:
> >
> > customer_id_list ->  getCustomerIds( //run some sql )
> > for each customerId in  customer_id_list:
> >   dag = DAG("Pipeline_"+ customerId, default_args=default_args,
> > *schedule_interval= **datetime.timedelta(minutes=60)*)
> >
> >
> > Let me know if that helps(or confuses :) ) you in understanding the flow.
> > If there is some thing wrong in what I am doing, I would love to know
> what
> > is it. This seems to be a serious issue (if it is) esp while running
> > backfill.
> >
> >
> >
> > On Fri, Apr 29, 2016 at 12:19 PM, Jeremiah Lowin <[email protected]>
> wrote:
> >
> > > That error message usually means that an error took place inside
> Airflow
> > > before the task ran -- maybe something with setting up the task? The
> > task's
> > > state is NONE, meaning it never even started, but the executor is
> > reporting
> > > that it successfully sent the command to start the task (SUCCESS)...
> the
> > > culprit is some failure in between.
> > >
> > > The error message seems to say that the DAG itself couldn't be loaded
> > from
> > > the .py file:
> > >
> > > airflow.utils.AirflowException: DAG
> > > [Pipeline_DEVTEST_CDB_DEVTEST_00_B10C8DBE1CFA89C1F274B]
> > > could not be found in /usr/local/airflow/dags/pipeline.py
> > >
> > > However a DAG with such a complicated name isn't referenced in the
> > example
> > > code (just "Pipeline" + i). My guess is that the DAG id is being
> > generated
> > > in a non-deterministic or time-based way, and therefore the run command
> > > can't find it once the generation criteria change. But hard to say
> > without
> > > more detail.
> > >
> > >
> > >
> > > On Fri, Apr 29, 2016 at 3:11 PM Bolke de Bruin <[email protected]>
> > wrote:
> > >
> > > > I would really like to know what the use case is for a
> depends_on_past
> > on
> > > > the *task* level.  What past are you trying to depend on?
> > > >
> > > > What I am currently assuming from just reading the example and
> replying
> > > on
> > > > my phone is that the depends_on_past prevents execution. Have t4 and
> t5
> > > > ever run?
> > > >
> > > > Bolke
> > > >
> > > > Sent from my iPhone
> > > >
> > > > > On 29 apr. 2016, at 21:01, Chris Riccomini <[email protected]>
> > > > wrote:
> > > > >
> > > > > @Bolke/@Jeremiah, do you guys think this is related? Full thread is
> > > here:
> > > > >
> > >
> https://groups.google.com/forum/?pli=1#!topic/airbnb_airflow/y7wt3I24Rmw
> > > > >
> > > > >> On Fri, Apr 29, 2016 at 11:57 AM, Chris Riccomini <
> [email protected]
> > >
> > > > wrote:
> > > > >>
> > > > >> Please subscribe to the dev@ mailing list. Sorry to make you jump
> > > > through
> > > > >> hoops--I know it's annoying--but it's for a good cause. ;)
> > > > >>
> > > > >> This looks like a bug. I'm wondering if it's related to
> > > > >> https://issues.apache.org/jira/browse/AIRFLOW-20. Perhaps the
> > > backfill
> > > > is
> > > > >> causing a mis-alignment between the dag runs, and depends_on_past
> > > logic
> > > > >> isn't seeing the prior execution?
> > > > >>
> > > >
> > >
> >
>

Reply via email to