Thanks! I like the suggestion about testing hooks rather than whole DAGs - we will certainly use it in the future. And BDD is the approach I really like - thanks for the code examples! We might also use it in the near future. Super helpful!
So far we mocked hooks in our unit tests only (for example here <https://github.com/PolideaInternal/incubator-airflow/blob/master/tests/contrib/operators/test_gcp_compute_operator.py#L241>) - that helps to test the logic of more complex operators. @Anthony - we also use a modified docker-based environment to run the tests (https://github.com/PolideaInternal/airflow-breeze/tree/integration-tests) including running full Dags. And yeah missing import was just an exaggerated example :) we also use IDE/lints to catch those early :D. I think still there is a need to run whole DAGs on top of testing operators and hooks separate. This is to test a bit more complex interactions between the operators. In our case we use example dags for both documentation and running full e2e integration tests (for example here https://github.com/PolideaInternal/incubator-airflow/blob/master/airflow/contrib/example_dags/example_gcp_compute.py). Those are simple examples but we will have a bit more complex interactions and it would be great to be able to run them quicker. However if we get the hook tests automated/unit-testable as well, maybe our current approach where we run them in the full dockerized environment will be good enough. J. On Thu, Oct 18, 2018 at 5:44 PM Anthony Brown <anthony.br...@johnlewis.co.uk> wrote: > I have pylint set up in my IDE which catches most silly errors like missing > imports > I also use a docker image so I can start up airflow locally and manually > test any changes before trying to deploy them. I use a slightly modified > version of https://github.com/puckel/docker-airflow to control it. This > only works on connections I have access to from my machine > Finally I have a suite of tests based on > > https://blog.usejournal.com/testing-in-airflow-part-1-dag-validation-tests-dag-definition-tests-and-unit-tests-2aa94970570c > which I can run to test DAGs are valid and any unit tests I can put in. The > tests are run in a docker container which runs a local db instance so I > have access to xcoms etc > > As part of my deployment pipeline, I run pylint and tests again before > deploying anywhere to make sure nobody has forgotten to run them locally > > Gerard - I like the suggestion about using mocked hooks and BDD. I will > look into this further > > On Thu, 18 Oct 2018 at 15:12, Gerard Toonstra <gtoons...@gmail.com> wrote: > > > There was a discussion about a unit testing approach last year 2017 I > > believe. If you dig the mail archives, you can find it. > > > > My take is: > > > > - You should test "hooks" against some real system, which can be a docker > > container. Make sure the behavior is predictable when talking against > that > > system. Hook tests are not part of general CI tests because of the > > complexity of the CI setup you'd have to make, so they are run on local > > boxes. > > - Maybe add additional "mock" hook tests, mocking out the connected > > systems. > > - When hooks are tested, operators can use 'mocked' hooks that no longer > > need access to actual systems. You can then set up an environment where > you > > have predictable inputs and outputs and test how the operators act on > them. > > I've used "behave" to do that with very simple record sets, but you can > > make these as complex as you want. > > - Then you know your hooks and operators work functionally. Testing if > your > > workflow works in general can be implemented by adding "check" operators. > > The benefit here is that you don't test the workflow once, but you test > for > > data consistency every time the dag runs. If you have complex workflows > > where the correct behavior of the flow is worrysome, then you may need to > > go deeper into it. > > > > The above doesn't depend on DAGS that need to be scheduled and the delays > > involving that. > > > > All of the above is implemented in my repo > > https://github.com/gtoonstra/airflow-hovercraft , using "behave" as a > BDD > > method of testing, so you can peruse that. > > > > Rgds, > > > > G> > > > > > > On Thu, Oct 18, 2018 at 2:43 PM Jarek Potiuk <jarek.pot...@polidea.com> > > wrote: > > > > > I am also looking to have (I think) similar workflow. Maybe someone has > > > done something similar and can give some hints on how to do it the > > easiest > > > way? > > > > > > Context: > > > > > > While developing operators I am using example test DAGs that talk to > GCP. > > > So far our "integration tests" require copying the dag folder and > > > restarting the airflow servers, unpausing the dag and waiting for it to > > > start. That takes a lot of time, sometimes just to find out that you > > missed > > > one import. > > > > > > Ideal workflow: > > > > > > Ideally I'd love to have a "unit" test (i.e possible to run via > nosetests > > > or IDE integration/PyCharm) that: > > > > > > - should not need to have airflow scheduler/webserver started. I > guess > > > we need a DB but possibly an in-memory, on-demand created database > > > might be > > > a good solution > > > - load the DAG from a file specified (not really from/dags > directory) > > > - build internal dependencies between the DAG tasks (as specified in > > the > > > Dag) > > > - run the DAG immediately and fully (i.e. run all the "execute" > > methods > > > as needed and pass XCOM between tasks). > > > - ideally produce log output in console rather in per-task files. > > > > > > I thought about using DagRun/DagBag but have not tried it yet and not > > sure > > > if you need to have whole environment set (which parts?). Any help > > > appreciated :) ? > > > > > > J. > > > > > > On Thu, Oct 18, 2018 at 1:08 AM bielllob...@gmail.com < > > > bielllob...@gmail.com> > > > wrote: > > > > > > > I think it would be great to have a way to mock airflow for unit > tests. > > > > The way I approached this was to create a context manager that > creates > > a > > > > temporary directory, sets the AIRFLOW_HOME environment variable to > this > > > > directory (only within the scope of the context manager) and then > > renders > > > > an airflow.cfg to that location. This creates an SQLite just for the > > test > > > > so you can add variables and connections needed for the test without > > > > affecting the real Airflow installation. > > > > > > > > The first thing I realized is that this didn't work if the imports > were > > > > outside the context manager, since airflow.configuration and > > > > airflow.settings perform all the initialization when they are > imported, > > > so > > > > the AIRFLOW_HOME variable is already set to the real installation > > before > > > > getting inside the context manager. > > > > > > > > The workaround for this was to reload those modules and this works > for > > > the > > > > tests I have written. However, when I tried to use it for something > > more > > > > complex (I have a plugin that I'm importing) I noticed that inside > the > > > > operator in this plugin, AIRFLOW_HOME is still set to the real > > > > installation, not the temporary one for the test. I thought this must > > be > > > > related to the imports but I haven't been able to figure out a way to > > fix > > > > the issue. I tried patching some methods but I must have been missing > > > > something because the database initialization failed. > > > > > > > > Does anyone have an idea on the best way to mock/patch airflow so > that > > > > EVERYTHING that is executed inside the context manager uses the > > temporary > > > > installation? > > > > > > > > PS: This is my current attempt which works for the tests I defined > but > > > not > > > > for external plugins: > > > > https://github.com/biellls/airflow_testing > > > > > > > > For an example on how it works: > > > > > > > > > > https://github.com/biellls/airflow_testing/blob/master/tests/mock_airflow_test.py > > > > > > > > > > > > > -- > > > > > > *Jarek Potiuk, Principal Software Engineer* > > > Mobile: +48 660 796 129 > > > > > > > > -- > -- > > Anthony Brown > Data Engineer BI Team - John Lewis > Tel : 0787 215 7305 > ********************************************************************** > This email is confidential and may contain copyright material of the John > Lewis Partnership. > If you are not the intended recipient, please notify us immediately and > delete all copies of this message. > (Please note that it is your responsibility to scan this message for > viruses). Email to and from the > John Lewis Partnership is automatically monitored for operational and > lawful business reasons. > ********************************************************************** > > John Lewis plc > Registered in England 233462 > Registered office 171 Victoria Street London SW1E 5NN > > Websites: https://www.johnlewis.com > http://www.waitrose.com > https://www.johnlewisfinance.com > http://www.johnlewispartnership.co.uk > > ********************************************************************** > -- *Jarek Potiuk, Principal Software Engineer* Mobile: +48 660 796 129