Thank you Sid!

I would like to float an idea. This is not even half baked... just to
prompt discussion!

One of the big frictions is that Airbnb carries a disproportionate share of
the burden of testing releases, and I believe that's largely for both
historical and inertial reasons. We want to bring more companies into the
release testing loop. However that's not without its own set of issues. The
primary one is that if a bug is discovered, either the company that
discovered it must fix it privately on their own infrastructure OR they
must create a simple, replicable example so the problem can be fixed in the
open. Neither option is appealing, as Airbnb is experiencing today.

So I'd like to float the idea of building a DAG sanitization tool (or DAG
mock tool). This tool would read in a DAG and spit out a "dummy" version of
the same DAG. Dependencies, schedules, triggers would all be maintained but
names and operators would be anonymized.

What I'm trying to do is separate "Airflow" from "Things Built With
Airflow". If my DAG fails but my sanitized DAG runs, then the fault is
probably my own (maybe my Python code is broken). However, if the sanitized
DAG fails, then the fault is certainly Airflow's. Sanitized DAGs could be
shared with the community since they would have no identifying marks and
wouldn't actually do anything.

Complications (there are many):
- What should Operators be replaced with. DummyOperators? Maybe the "base"
Airflow Operators also implement sanitized versions of themselves.
- XComs (and any other objects keyed by strings) -- how they should be
anonymized?

Food for thought...

J


On Wed, Jun 1, 2016 at 6:33 PM siddharth anand <[email protected]> wrote:

>  Hi Folks!
> We held our first contributor meeting this morning. I was about 20 minutes
> late, but did ask others in attendance for their input before compiling
> these minutes.
>
> *Agenda* :
>
> https://cwiki.apache.org/confluence/display/AIRFLOW/Announcements#Announcements-May27,2016
>
> *Outcomes*:
>
>    - We need better and more test coverage
>       - Committers should ask PR authors to include tests when possible.
>       There may be some exceptions to this : e.g. google cloud storage,
> etc...
>       where it is difficult to stub out or mock storage
>       - End-to-end dag testing with a corpus of test dags
>          - Max, you have a PR (to approve) in this regard
>       - A reiteration of already ratified rules:
>       - Committers should follow the instructions outlined on Committers'
>       Guide
>    - A few of the non-Airbnb committers will drive the next release,
>    including baking release candidates in our own production and
>    pre-production environments
>       - Currently, Sid, Bolke, and Chris voiced interest in driving this,
>       but all from the community are welcome to help with release candidate
>       certification
>    - Working collaboratively as a community
>       - Airbnb's roadmap for Airflow does not appear to be public
>          - https://cwiki.apache.org/confluence/display/AIRFLOW/Roadmap
>          - Large PRs do not
>       - For large PRs, first put up and socialze a design document
>       - Authors of PRs should seek out the right committers for PR reviews
>       - Leverage the dev list for conversations
>
> -s
>
>

Reply via email to