Good points @Gerard. I think the distinctions you make between different testing considerations could help us focus our efforts. Here's my 2 cents in the buckets you describe; I'm wondering if any of these use cases align with anyone else and can help narrow our scope, and if I understood you right @Gerard:
Regarding platform code: For our own platform code (ie custom Operators and Hooks), we have our CI platform running unittests on their construction and, in the case of hooks, integration tests on connectivity. The latter involves us setting up test integration services (i.e. a test MySQL process) which we start up as docker containers and we flip our airflow's configuration to point at them during testing using environment variables. It seems from a browse on airflow's testing that operators and hooks are mostly unittested, with the integrations mocked or skipped (ie https://github.com/apache/incubator-airflow/blob/master/tests/contrib/hooks/test_jira_hook.py#L40-L41 or https://github.com/apache/incubator-airflow/blob/master/tests/contrib/hooks/test_sqoop_hook.py#L123-L125). If the hook is using some other, well tested library to actually establish the connection, the case can probably be made here that the custom operator and hook authors don't need integration tests, so since the normal unittest library is enough to handle these that might not need to be in scope for a new testing library to describe. Regarding data manipulation functions of the business code: For us, we run tests on each operator in each DAG on CI, seeded with test input data, asserted against known output data, all of which we have compiled over time to represent different edge cases we expect or have seen. So this is a test at the level of the operator as described in a given DAG. Because we only describe edge cases we have seen or can predict, its a very reactive way to handle testing at this level. If I understand your idea right, another way to test (or at least, surface errors) at this level is, given you have a DAG that is resilient against arbitrary data failures, your DAG should include a validation task/report at its end or a test suite should run daily against the production error log for that DAG that surfaces errors your business code encountered on production data. I think this is really interesting and reminds me of an airflow video I saw once (can't remember who gave the talk) on a DAG whose last task self-reported error counts and rows lost. If implemented as a test suite you would run against production this might be a direction we would want a testing library to go into. Regarding the workflow correctness of the business code: What we set out to do on our side was a hybrid version of your item 1 and 2 which we call "end-to-end tests": to call a whole DAG against 'real' existing systems (though really they are test docker containers of the processes we need (MySQL and Neo4J specifically) that we use environment variables to switch our airflow to use when instantiating hooks etc), seeded with test input files for services that are hard to set up (i.e. third party APIs we ingest data from). Since the whole DAG is seeded with known input data, this gives us a way to compare the last output of a DAG to a known file, so that if any workflow changes OR business logic in the middle affected the final output, we would know as part of our test suite instead of when production breaks. In other words, a way to test a regression of the whole DAG. So this is the framework we were thinking needed to be created, and is a direction we could go with a testing library as well. This doesn't get to your point of determining what workflow was used, which is interesting, just not a use case we have encountered yet (we only have deterministic DAGs). In my mind in this case we would want a testing suite to be able to more or less turn some DAGs "on" against seeded input data and mocked or test integration services, let a scheduler go at it, and then check the metadata database for what workflow happened (and, if we had test integration services, maybe also check the output against the known output for the seeded input). I can definitely see your suggestion of developing instrumentation to inspect a followed workflow as a useful addition a testing library could include. To some degree our end-to-end DAG tests overlaps in our workflow with your point 3 (UAT environment), but we've found that more useful to test if "wild data" causes uncaught exceptions or any integration errors with difficult-to-mock third party services, not DAG level logic regressions, since the input data is unknown and thus we can't compare to a known output in this case, depending instead on a fallible human QA or just accepting that the DAG running with no exceptions as passing UAT. Laura On Tue, May 9, 2017 at 2:15 AM, Gerard Toonstra <[email protected]> wrote: > Very interesting video. I was unable to take part. I watched only part of > it for now. > Let us know where the discussion is being moved to. > > The confluence does indeed seem to be the place to put final conclusions > and thoughts. > > For airflow, I like to make a distinction between "platform" and "business" > code. The platform code are > the hooks and operators and provide the capabilities of what your ETL > system can do. You'll test this > code with a lot of thoroughness, such that each component behaves how you'd > expect, judging from > the constructor interface. Any abstractions in there (like copying files to > GCS) should be kept as hidden > as possible (retries, etc). > > The "business" code is what runs on a daily basis. This can be divided in > another two concerns > for testing: > > 1 The workflow, the code between the data manipulation functions that > decides which operators get called > 2 The data manipulation function. > > > I think it's good practice to run tests on "2" on a daily basis and not > just once on CI. The reason is that there > are too many unforeseen circumstances where data can get into a bad state. > So such tests shouldn't run > once on a highly controlled environment like CI, but run daily in a less > predictable environment like production, > where all kind of weird things can happen, but you'll be able to catch with > proper checks in place. Even if the checks > are too rigorous, you can skip them and improve on them, so that it fits > what goes on in your environment > to your best ability. > > > Which mostly leaves testing workflow correctness and platform code. What I > had intended to do was; > > 1. Test the platform code against real existing systems (or maybe docker > containers), to test their behavior > in success and failure conditions. > 2. Create workflow scripts for testing the workflow; this probably requires > some specific changes in hooks, > which wouldn't call out to other systems, but would just pick up small > files you prepare from a testing repo > and pass them around. The test script could also simulate > unavailability, etc. > This relieves you of a huge responsibility of setting up systems, docker > containers and load that with data. > Airflow sets up pretty quickly as a docker container and you can also > start up a sample database with that. > Afterwards, from a test script, you can check which workflow was > followed by inspecting the database, > so develop some instrumentation for that. > 3. Test the data manipulation in a UAT environment, mirrorring the runs in > production to some extent. > That would be a place to verify if the data comes out correctly and > also show people what kind of > monitoring is in place to double-check that. > > > On Tue, May 9, 2017 at 1:14 AM, Arnie Salazar <[email protected]> > wrote: > > > Scratch that. I see the whole video now. > > > > On Mon, May 8, 2017 at 3:33 PM Arnie Salazar <[email protected]> > > wrote: > > > > > Thanks Sam! > > > > > > Is there a part 2 to the video? If not, can you post the "next steps" > > > notes you took whenever you have a chance? > > > > > > Cheers, > > > Arnie > > > > > > On Mon, May 8, 2017 at 3:08 PM Sam Elamin <[email protected]> > > wrote: > > > > > >> Hi Folks > > >> > > >> For those of you who missed it, you can catch the discussion from the > > link > > >> on this tweet <https://twitter.com/samelamin/status/ > 861703888298225670> > > >> > > >> Please do share and feel free to get involved as the more feedback we > > get > > >> the better the library we create is :) > > >> > > >> Regards > > >> Sam > > >> > > >> On Mon, May 8, 2017 at 9:43 PM, Sam Elamin <[email protected]> > > >> wrote: > > >> > > >> > Bit late notice but the call is happening today at 9 15 utc so in > > about > > >> > 30 mins or so > > >> > > > >> > It will be recorded but if anyone would like to join in on the > > >> discussion > > >> > the hangout link is https://hangouts.google.com/hangouts/_/ > > >> > mbkr6xassnahjjonpuvrirxbnae > > >> > > > >> > Regards > > >> > Sam > > >> > > > >> > On Fri, 5 May 2017 at 21:35, Ali Uz <[email protected]> wrote: > > >> > > > >> >> I am also very interested in seeing how this turns out. Even though > > we > > >> >> don't have a testing framework in-place on the project I am working > > >> on, I > > >> >> would very much like to contribute to some general framework for > > >> testing > > >> >> DAGs. > > >> >> > > >> >> As of now we are just implementing dummy tasks that test our actual > > >> tasks > > >> >> and verify if the given input produces the expected output. Nothing > > >> crazy > > >> >> and certainly not flexible in the long run. > > >> >> > > >> >> > > >> >> On Fri, 5 May 2017 at 22:59, Sam Elamin <[email protected]> > > >> wrote: > > >> >> > > >> >> > Haha yes Scott you are in! > > >> >> > On Fri, 5 May 2017 at 20:07, Scott Halgrim < > > [email protected] > > >> > > > >> >> > wrote: > > >> >> > > > >> >> > > Sounds A+ to me. By “both of you” did you include me? My first > > >> >> response > > >> >> > > was just to your email address. > > >> >> > > > > >> >> > > On May 5, 2017, 11:58 AM -0700, Sam Elamin < > > >> [email protected]>, > > >> >> > > wrote: > > >> >> > > > Ok sounds great folks > > >> >> > > > > > >> >> > > > Thanks for the detailed response laura! I'll invite both of > you > > >> to > > >> >> the > > >> >> > > > group if you are happy and we can schedule a call for next > > week? > > >> >> > > > > > >> >> > > > How does that sound? > > >> >> > > > On Fri, 5 May 2017 at 17:41, Laura Lorenz < > > >> [email protected] > > >> >> > > > >> >> > > wrote: > > >> >> > > > > > >> >> > > > > We do! We developed our own little in-house DAG test > > framework > > >> >> which > > >> >> > we > > >> >> > > > > could share insights on/would love to hear what other folks > > >> are up > > >> >> > to. > > >> >> > > > > Basically we use mock a DAG's input data, use the > BackfillJob > > >> API > > >> >> > > directly > > >> >> > > > > to call a DAG in a test, and compare its outputs to the > > >> intended > > >> >> > result > > >> >> > > > > given the inputs. We use docker/docker-compose to manage > > >> services, > > >> >> > and > > >> >> > > > > split our dev and test stack locally so that the tests have > > >> their > > >> >> own > > >> >> > > > > scheduler and metadata database and so that our CI tool > knows > > >> how > > >> >> to > > >> >> > > > > construct the test stack as well. > > >> >> > > > > > > >> >> > > > > We co-opted the BackfillJob API for our own purposes here, > > but > > >> it > > >> >> > > seemed > > >> >> > > > > overly complicated and fragile to start and interact with > our > > >> own > > >> >> > > > > in-test-process executor like we saw in a few of the tests > in > > >> the > > >> >> > > Airflow > > >> >> > > > > test suite. So I'd be really interested on finding a way to > > >> >> > streamline > > >> >> > > how > > >> >> > > > > to describe a test executor for both the Airflow test suite > > and > > >> >> > > people's > > >> >> > > > > own DAG testing and make that a first class type of API. > > >> >> > > > > > > >> >> > > > > Laura > > >> >> > > > > > > >> >> > > > > On Fri, May 5, 2017 at 11:46 AM, Sam Elamin < > > >> >> [email protected] > > >> >> > > > > wrote: > > >> >> > > > > > > >> >> > > > > > Hi All > > >> >> > > > > > > > >> >> > > > > > A few people in the Spark community are interested in > > >> writing a > > >> >> > > testing > > >> >> > > > > > library for Airflow. We would love anyone who uses > Airflow > > >> >> heavily > > >> >> > in > > >> >> > > > > > production to be involved > > >> >> > > > > > > > >> >> > > > > > At the moment (AFAIK) testing your DAGs is a bit of a > pain, > > >> >> > > especially if > > >> >> > > > > > you want to run them in a CI server > > >> >> > > > > > > > >> >> > > > > > Is anyone interested in being involved in the discussion? > > >> >> > > > > > > > >> >> > > > > > Kind Regards > > >> >> > > > > > Sam > > >> >> > > > > > > > >> >> > > > > > > >> >> > > > > >> >> > > > >> >> > > >> > > > >> > > > > > >
