Re: Airflow Contributors Meeting (June 01, 2016) : Minutes

Chris Riccomini Fri, 03 Jun 2016 18:00:42 -0700

Hey Jeremiah,

Something that's been floating in my head is a basic assertion script for
DAGs that will validate things are as expected. This can be used to monitor
test DAGs (especially if we do nightly builds). The assertions could be
things like:


* This DAG should have an execution date ever N minutes
* This DAG should finish within N minutes of starting
* There should be no task failures
* XCom values for this DAG should be ..

Basically, something that can inspect the Airflow DB's state, and compare
it with what's expected, and fail if they don't match.

I think if we had these two tools, we could have a fairly nice way to
generate some test DAGs, and verify that they work consistently (even on
master).

Thoughts?

Cheers,
Chris

On Thu, Jun 2, 2016 at 5:26 AM, Jeremiah Lowin <[email protected]> wrote:

> Thank you Sid!
>
> I would like to float an idea. This is not even half baked... just to
> prompt discussion!
>
> One of the big frictions is that Airbnb carries a disproportionate share of
> the burden of testing releases, and I believe that's largely for both
> historical and inertial reasons. We want to bring more companies into the
> release testing loop. However that's not without its own set of issues. The
> primary one is that if a bug is discovered, either the company that
> discovered it must fix it privately on their own infrastructure OR they
> must create a simple, replicable example so the problem can be fixed in the
> open. Neither option is appealing, as Airbnb is experiencing today.
>
> So I'd like to float the idea of building a DAG sanitization tool (or DAG
> mock tool). This tool would read in a DAG and spit out a "dummy" version of
> the same DAG. Dependencies, schedules, triggers would all be maintained but
> names and operators would be anonymized.
>
> What I'm trying to do is separate "Airflow" from "Things Built With
> Airflow". If my DAG fails but my sanitized DAG runs, then the fault is
> probably my own (maybe my Python code is broken). However, if the sanitized
> DAG fails, then the fault is certainly Airflow's. Sanitized DAGs could be
> shared with the community since they would have no identifying marks and
> wouldn't actually do anything.
>
> Complications (there are many):
> - What should Operators be replaced with. DummyOperators? Maybe the "base"
> Airflow Operators also implement sanitized versions of themselves.
> - XComs (and any other objects keyed by strings) -- how they should be
> anonymized?
>
> Food for thought...
>
> J
>
>
> On Wed, Jun 1, 2016 at 6:33 PM siddharth anand <[email protected]> wrote:
>
> >  Hi Folks!
> > We held our first contributor meeting this morning. I was about 20
> minutes
> > late, but did ask others in attendance for their input before compiling
> > these minutes.
> >
> > *Agenda* :
> >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/Announcements#Announcements-May27,2016
> >
> > *Outcomes*:
> >
> >    - We need better and more test coverage
> >       - Committers should ask PR authors to include tests when possible.
> >       There may be some exceptions to this : e.g. google cloud storage,
> > etc...
> >       where it is difficult to stub out or mock storage
> >       - End-to-end dag testing with a corpus of test dags
> >          - Max, you have a PR (to approve) in this regard
> >       - A reiteration of already ratified rules:
> >       - Committers should follow the instructions outlined on Committers'
> >       Guide
> >    - A few of the non-Airbnb committers will drive the next release,
> >    including baking release candidates in our own production and
> >    pre-production environments
> >       - Currently, Sid, Bolke, and Chris voiced interest in driving this,
> >       but all from the community are welcome to help with release
> candidate
> >       certification
> >    - Working collaboratively as a community
> >       - Airbnb's roadmap for Airflow does not appear to be public
> >          - https://cwiki.apache.org/confluence/display/AIRFLOW/Roadmap
> >          - Large PRs do not
> >       - For large PRs, first put up and socialze a design document
> >       - Authors of PRs should seek out the right committers for PR
> reviews
> >       - Leverage the dev list for conversations
> >
> > -s
> >
> >
>

Re: Airflow Contributors Meeting (June 01, 2016) : Minutes

Reply via email to