Hi all, Is there progress made on this effort? I think it's really interesting and very important for driving further adoption of apache airflow.
small plug: I started a public repo to test out my ideas as written in the thread: https://github.com/gtoonstra/airflow-hovercraft The "reference implementation" may be a bit too ambitious, but we can see where it goes. The intention is to explore what engineers would run into when dealing with airflow and I think having a solid testing approach is paramount in delivering reliably and timely. I'm not intending to copy core/contrib stuff that is there, I'm mostly wrapping them to be able to instrument them and run the tests in the way as envisioned: against real backends and with simulated data at higher levels of workflow execution using python behavior testing. Anyone willing to participate, privmsg me if you're interested and I can add you. Rgds, Gerard On Thu, May 18, 2017 at 2:00 PM, Gerard Toonstra <[email protected]> wrote: > >> On Tue, May 9, 2017 at 9:46 PM, Arthur Wiedmer <[email protected]> >> wrote: >> >>> Hi, >>> >>> I would love to see if we can contribute some of the work we have done >>> internally at Airbnb to support some testing of DAGs. We have a long ways >>> to go though :) >>> >>> Best, >>> Arthur >>> >>> On Tue, May 9, 2017 at 12:34 PM, Sam Elamin <[email protected]> >>> wrote: >>> >>> > Thanks Gerard and Laura, I have created an email thread as agreed in >>> the >>> > call so lets take the discussion there. If anyone else is interested in >>> > helping us build this library please do get in touch! >>> > >>> > On Tue, May 9, 2017 at 5:40 PM, Laura Lorenz <[email protected] >>> > >>> > wrote: >>> > >>> > > Good points @Gerard. I think the distinctions you make between >>> different >>> > > testing considerations could help us focus our efforts. Here's my 2 >>> cents >>> > > in the buckets you describe; I'm wondering if any of these use cases >>> > align >>> > > with anyone else and can help narrow our scope, and if I understood >>> you >>> > > right @Gerard: >>> > > >>> > > Regarding platform code: For our own platform code (ie custom >>> Operators >>> > and >>> > > Hooks), we have our CI platform running unittests on their >>> construction >>> > > and, in the case of hooks, integration tests on connectivity. The >>> latter >>> > > involves us setting up test integration services (i.e. a test MySQL >>> > > process) which we start up as docker containers and we flip our >>> airflow's >>> > > configuration to point at them during testing using environment >>> > variables. >>> > > It seems from a browse on airflow's testing that operators and hooks >>> are >>> > > mostly unittested, with the integrations mocked or skipped (ie >>> > > https://github.com/apache/incubator-airflow/blob/master/ >>> > > tests/contrib/hooks/test_jira_hook.py#L40-L41 >>> > > or >>> > > https://github.com/apache/incubator-airflow/blob/master/ >>> > > tests/contrib/hooks/test_sqoop_hook.py#L123-L125). >>> > > If the hook is using some other, well tested library to actually >>> > establish >>> > > the connection, the case can probably be made here that the custom >>> > operator >>> > > and hook authors don't need integration tests, so since the normal >>> > unittest >>> > > library is enough to handle these that might not need to be in scope >>> for >>> > a >>> > > new testing library to describe. >>> > > >>> > > Regarding data manipulation functions of the business code: >>> > > For us, we run tests on each operator in each DAG on CI, seeded with >>> test >>> > > input data, asserted against known output data, all of which we have >>> > > compiled over time to represent different edge cases we expect or >>> have >>> > > seen. So this is a test at the level of the operator as described in >>> a >>> > > given DAG. Because we only describe edge cases we have seen or can >>> > predict, >>> > > its a very reactive way to handle testing at this level. >>> > > >>> > > If I understand your idea right, another way to test (or at least, >>> > surface >>> > > errors) at this level is, given you have a DAG that is resilient >>> against >>> > > arbitrary data failures, your DAG should include a validation >>> task/report >>> > > at its end or a test suite should run daily against the production >>> error >>> > > log for that DAG that surfaces errors your business code encountered >>> on >>> > > production data. I think this is really interesting and reminds me >>> of an >>> > > airflow video I saw once (can't remember who gave the talk) on a DAG >>> > whose >>> > > last task self-reported error counts and rows lost. If implemented >>> as a >>> > > test suite you would run against production this might be a >>> direction we >>> > > would want a testing library to go into. >>> > > >>> > > Regarding the workflow correctness of the business code: >>> > > What we set out to do on our side was a hybrid version of your item 1 >>> > and 2 >>> > > which we call "end-to-end tests": to call a whole DAG against 'real' >>> > > existing systems (though really they are test docker containers of >>> the >>> > > processes we need (MySQL and Neo4J specifically) that we use >>> environment >>> > > variables to switch our airflow to use when instantiating hooks etc), >>> > > seeded with test input files for services that are hard to set up >>> (i.e. >>> > > third party APIs we ingest data from). Since the whole DAG is seeded >>> with >>> > > known input data, this gives us a way to compare the last output of >>> a DAG >>> > > to a known file, so that if any workflow changes OR business logic >>> in the >>> > > middle affected the final output, we would know as part of our test >>> suite >>> > > instead of when production breaks. In other words, a way to test a >>> > > regression of the whole DAG. So this is the framework we were >>> thinking >>> > > needed to be created, and is a direction we could go with a testing >>> > library >>> > > as well. >>> > > >>> > > This doesn't get to your point of determining what workflow was used, >>> > which >>> > > is interesting, just not a use case we have encountered yet (we only >>> have >>> > > deterministic DAGs). In my mind in this case we would want a testing >>> > suite >>> > > to be able to more or less turn some DAGs "on" against seeded input >>> data >>> > > and mocked or test integration services, let a scheduler go at it, >>> and >>> > then >>> > > check the metadata database for what workflow happened (and, if we >>> had >>> > test >>> > > integration services, maybe also check the output against the known >>> > output >>> > > for the seeded input). I can definitely see your suggestion of >>> developing >>> > > instrumentation to inspect a followed workflow as a useful addition a >>> > > testing library could include. >>> > > >>> > > To some degree our end-to-end DAG tests overlaps in our workflow with >>> > your >>> > > point 3 (UAT environment), but we've found that more useful to test >>> if >>> > > "wild data" causes uncaught exceptions or any integration errors with >>> > > difficult-to-mock third party services, not DAG level logic >>> regressions, >>> > > since the input data is unknown and thus we can't compare to a known >>> > output >>> > > in this case, depending instead on a fallible human QA or just >>> accepting >>> > > that the DAG running with no exceptions as passing UAT. >>> > > >>> > > Laura >>> > > >>> > > On Tue, May 9, 2017 at 2:15 AM, Gerard Toonstra <[email protected] >>> > >>> > > wrote: >>> > > >>> > > > Very interesting video. I was unable to take part. I watched only >>> part >>> > of >>> > > > it for now. >>> > > > Let us know where the discussion is being moved to. >>> > > > >>> > > > The confluence does indeed seem to be the place to put final >>> > conclusions >>> > > > and thoughts. >>> > > > >>> > > > For airflow, I like to make a distinction between "platform" and >>> > > "business" >>> > > > code. The platform code are >>> > > > the hooks and operators and provide the capabilities of what your >>> ETL >>> > > > system can do. You'll test this >>> > > > code with a lot of thoroughness, such that each component behaves >>> how >>> > > you'd >>> > > > expect, judging from >>> > > > the constructor interface. Any abstractions in there (like copying >>> > files >>> > > to >>> > > > GCS) should be kept as hidden >>> > > > as possible (retries, etc). >>> > > > >>> > > > The "business" code is what runs on a daily basis. This can be >>> divided >>> > in >>> > > > another two concerns >>> > > > for testing: >>> > > > >>> > > > 1 The workflow, the code between the data manipulation functions >>> that >>> > > > decides which operators get called >>> > > > 2 The data manipulation function. >>> > > > >>> > > > >>> > > > I think it's good practice to run tests on "2" on a daily basis >>> and not >>> > > > just once on CI. The reason is that there >>> > > > are too many unforeseen circumstances where data can get into a bad >>> > > state. >>> > > > So such tests shouldn't run >>> > > > once on a highly controlled environment like CI, but run daily in a >>> > less >>> > > > predictable environment like production, >>> > > > where all kind of weird things can happen, but you'll be able to >>> catch >>> > > with >>> > > > proper checks in place. Even if the checks >>> > > > are too rigorous, you can skip them and improve on them, so that it >>> > fits >>> > > > what goes on in your environment >>> > > > to your best ability. >>> > > > >>> > > > >>> > > > Which mostly leaves testing workflow correctness and platform code. >>> > What >>> > > I >>> > > > had intended to do was; >>> > > > >>> > > > 1. Test the platform code against real existing systems (or maybe >>> > docker >>> > > > containers), to test their behavior >>> > > > in success and failure conditions. >>> > > > 2. Create workflow scripts for testing the workflow; this probably >>> > > requires >>> > > > some specific changes in hooks, >>> > > > which wouldn't call out to other systems, but would just pick up >>> > small >>> > > > files you prepare from a testing repo >>> > > > and pass them around. The test script could also simulate >>> > > > unavailability, etc. >>> > > > This relieves you of a huge responsibility of setting up >>> systems, >>> > > docker >>> > > > containers and load that with data. >>> > > > Airflow sets up pretty quickly as a docker container and you >>> can >>> > also >>> > > > start up a sample database with that. >>> > > > Afterwards, from a test script, you can check which workflow >>> was >>> > > > followed by inspecting the database, >>> > > > so develop some instrumentation for that. >>> > > > 3. Test the data manipulation in a UAT environment, mirrorring the >>> runs >>> > > in >>> > > > production to some extent. >>> > > > That would be a place to verify if the data comes out >>> correctly and >>> > > > also show people what kind of >>> > > > monitoring is in place to double-check that. >>> > > > >>> > > > >>> > > > On Tue, May 9, 2017 at 1:14 AM, Arnie Salazar < >>> [email protected]> >>> > > > wrote: >>> > > > >>> > > > > Scratch that. I see the whole video now. >>> > > > > >>> > > > > On Mon, May 8, 2017 at 3:33 PM Arnie Salazar < >>> [email protected] >>> > > >>> > > > > wrote: >>> > > > > >>> > > > > > Thanks Sam! >>> > > > > > >>> > > > > > Is there a part 2 to the video? If not, can you post the "next >>> > steps" >>> > > > > > notes you took whenever you have a chance? >>> > > > > > >>> > > > > > Cheers, >>> > > > > > Arnie >>> > > > > > >>> > > > > > On Mon, May 8, 2017 at 3:08 PM Sam Elamin < >>> [email protected] >>> > > >>> > > > > wrote: >>> > > > > > >>> > > > > >> Hi Folks >>> > > > > >> >>> > > > > >> For those of you who missed it, you can catch the discussion >>> from >>> > > the >>> > > > > link >>> > > > > >> on this tweet <https://twitter.com/samelamin/status/ >>> > > > 861703888298225670> >>> > > > > >> >>> > > > > >> Please do share and feel free to get involved as the more >>> feedback >>> > > we >>> > > > > get >>> > > > > >> the better the library we create is :) >>> > > > > >> >>> > > > > >> Regards >>> > > > > >> Sam >>> > > > > >> >>> > > > > >> On Mon, May 8, 2017 at 9:43 PM, Sam Elamin < >>> > [email protected] >>> > > > >>> > > > > >> wrote: >>> > > > > >> >>> > > > > >> > Bit late notice but the call is happening today at 9 15 utc >>> so >>> > in >>> > > > > about >>> > > > > >> > 30 mins or so >>> > > > > >> > >>> > > > > >> > It will be recorded but if anyone would like to join in on >>> the >>> > > > > >> discussion >>> > > > > >> > the hangout link is https://hangouts.google.com/hangouts/_/ >>> > > > > >> > mbkr6xassnahjjonpuvrirxbnae >>> > > > > >> > >>> > > > > >> > Regards >>> > > > > >> > Sam >>> > > > > >> > >>> > > > > >> > On Fri, 5 May 2017 at 21:35, Ali Uz <[email protected]> >>> wrote: >>> > > > > >> > >>> > > > > >> >> I am also very interested in seeing how this turns out. >>> Even >>> > > though >>> > > > > we >>> > > > > >> >> don't have a testing framework in-place on the project I am >>> > > working >>> > > > > >> on, I >>> > > > > >> >> would very much like to contribute to some general >>> framework >>> > for >>> > > > > >> testing >>> > > > > >> >> DAGs. >>> > > > > >> >> >>> > > > > >> >> As of now we are just implementing dummy tasks that test >>> our >>> > > actual >>> > > > > >> tasks >>> > > > > >> >> and verify if the given input produces the expected output. >>> > > Nothing >>> > > > > >> crazy >>> > > > > >> >> and certainly not flexible in the long run. >>> > > > > >> >> >>> > > > > >> >> >>> > > > > >> >> On Fri, 5 May 2017 at 22:59, Sam Elamin < >>> > [email protected] >>> > > > >>> > > > > >> wrote: >>> > > > > >> >> >>> > > > > >> >> > Haha yes Scott you are in! >>> > > > > >> >> > On Fri, 5 May 2017 at 20:07, Scott Halgrim < >>> > > > > [email protected] >>> > > > > >> > >>> > > > > >> >> > wrote: >>> > > > > >> >> > >>> > > > > >> >> > > Sounds A+ to me. By “both of you” did you include me? >>> My >>> > > first >>> > > > > >> >> response >>> > > > > >> >> > > was just to your email address. >>> > > > > >> >> > > >>> > > > > >> >> > > On May 5, 2017, 11:58 AM -0700, Sam Elamin < >>> > > > > >> [email protected]>, >>> > > > > >> >> > > wrote: >>> > > > > >> >> > > > Ok sounds great folks >>> > > > > >> >> > > > >>> > > > > >> >> > > > Thanks for the detailed response laura! I'll invite >>> both >>> > of >>> > > > you >>> > > > > >> to >>> > > > > >> >> the >>> > > > > >> >> > > > group if you are happy and we can schedule a call for >>> > next >>> > > > > week? >>> > > > > >> >> > > > >>> > > > > >> >> > > > How does that sound? >>> > > > > >> >> > > > On Fri, 5 May 2017 at 17:41, Laura Lorenz < >>> > > > > >> [email protected] >>> > > > > >> >> > >>> > > > > >> >> > > wrote: >>> > > > > >> >> > > > >>> > > > > >> >> > > > > We do! We developed our own little in-house DAG >>> test >>> > > > > framework >>> > > > > >> >> which >>> > > > > >> >> > we >>> > > > > >> >> > > > > could share insights on/would love to hear what >>> other >>> > > folks >>> > > > > >> are up >>> > > > > >> >> > to. >>> > > > > >> >> > > > > Basically we use mock a DAG's input data, use the >>> > > > BackfillJob >>> > > > > >> API >>> > > > > >> >> > > directly >>> > > > > >> >> > > > > to call a DAG in a test, and compare its outputs >>> to the >>> > > > > >> intended >>> > > > > >> >> > result >>> > > > > >> >> > > > > given the inputs. We use docker/docker-compose to >>> > manage >>> > > > > >> services, >>> > > > > >> >> > and >>> > > > > >> >> > > > > split our dev and test stack locally so that the >>> tests >>> > > have >>> > > > > >> their >>> > > > > >> >> own >>> > > > > >> >> > > > > scheduler and metadata database and so that our CI >>> tool >>> > > > knows >>> > > > > >> how >>> > > > > >> >> to >>> > > > > >> >> > > > > construct the test stack as well. >>> > > > > >> >> > > > > >>> > > > > >> >> > > > > We co-opted the BackfillJob API for our own >>> purposes >>> > > here, >>> > > > > but >>> > > > > >> it >>> > > > > >> >> > > seemed >>> > > > > >> >> > > > > overly complicated and fragile to start and >>> interact >>> > with >>> > > > our >>> > > > > >> own >>> > > > > >> >> > > > > in-test-process executor like we saw in a few of >>> the >>> > > tests >>> > > > in >>> > > > > >> the >>> > > > > >> >> > > Airflow >>> > > > > >> >> > > > > test suite. So I'd be really interested on finding >>> a >>> > way >>> > > to >>> > > > > >> >> > streamline >>> > > > > >> >> > > how >>> > > > > >> >> > > > > to describe a test executor for both the Airflow >>> test >>> > > suite >>> > > > > and >>> > > > > >> >> > > people's >>> > > > > >> >> > > > > own DAG testing and make that a first class type of >>> > API. >>> > > > > >> >> > > > > >>> > > > > >> >> > > > > Laura >>> > > > > >> >> > > > > >>> > > > > >> >> > > > > On Fri, May 5, 2017 at 11:46 AM, Sam Elamin < >>> > > > > >> >> [email protected] >>> > > > > >> >> > > > > wrote: >>> > > > > >> >> > > > > >>> > > > > >> >> > > > > > Hi All >>> > > > > >> >> > > > > > >>> > > > > >> >> > > > > > A few people in the Spark community are >>> interested in >>> > > > > >> writing a >>> > > > > >> >> > > testing >>> > > > > >> >> > > > > > library for Airflow. We would love anyone who >>> uses >>> > > > Airflow >>> > > > > >> >> heavily >>> > > > > >> >> > in >>> > > > > >> >> > > > > > production to be involved >>> > > > > >> >> > > > > > >>> > > > > >> >> > > > > > At the moment (AFAIK) testing your DAGs is a bit >>> of a >>> > > > pain, >>> > > > > >> >> > > especially if >>> > > > > >> >> > > > > > you want to run them in a CI server >>> > > > > >> >> > > > > > >>> > > > > >> >> > > > > > Is anyone interested in being involved in the >>> > > discussion? >>> > > > > >> >> > > > > > >>> > > > > >> >> > > > > > Kind Regards >>> > > > > >> >> > > > > > Sam >>> > > > > >> >> > > > > > >>> > > > > >> >> > > > > >>> > > > > >> >> > > >>> > > > > >> >> > >>> > > > > >> >> >>> > > > > >> > >>> > > > > >> >>> > > > > > >>> > > > > >>> > > > >>> > > >>> > >>> >> >> >
