On Sat, Feb 15, 2020, 20:54 Ash Berlin-Taylor <a...@apache.org> wrote:
> Yeah, I'm for this. > > In fact I'm about to mark some of the Hive ones as system tests as they > require a running hive cluster. > I would be careful about which automatically marking unit tests that run > dags as system/integration though, a number of our unit tests rely on this > to test the tasks in various states in the parts of the scheduler. Ideally > they wouldn't, but right now they do, and the tasks they run are of the > DummyOperator or "bash_command=date" flavour. > Agree with Ash here, I think for this we should have yet another category 'dag tests' 'core tests' ? - those are indeed run using dags run with the whole Airflow underneath but their purpose I to yes Airvlow Cor not external systems j. -ash > On Feb 15 2020, at 7:28 pm, Tomasz Urbaszek <turbas...@apache.org> wrote: > > +1 for introducing system tests. Lack of them is a big pain. > > > > I would like also to suggest to mark some actual tests (those running > > DAGs, etc) as system tests. Then we can simplify our units and > > probably speed up CI builds (not to mention the reduction of side > > effects). The approach used for GCP system tests that runs an example > > DAG makes creating such tests really easy (or we can generate them > > automatically...).I > > > Regarding the frequency of such tests, I think we should run all of > > them daily on master. Or run them when there is a change in specific > > files (operators / hooks etc). > > > > Tomek > > > > On Sat, Feb 15, 2020 at 1:15 PM Jarek Potiuk <jarek.pot...@polidea.com> > wrote: > > > > > > TL;DR; I would like to revive a discussion (hopefully short :) and > possibly > > > cast a vote on "AIP-4 - Support for System Tests for external systems". > > > > > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-4+Support+for+System+Tests+for+external+systems > > > This is the very first AIP I created almost 1.5 years ago and it took > very > > > long to get to the point where I think we are very, very close to being > > > able to implement it after many, many baby steps (and some bigger > leaps) > > > that we've done in the meantime. > > > > > > *Let me just quickly summarise what is the context:* > > > - One of the biggest Airflow advantages are integrations with external > > > systems. We have i think several 100s of hooks and operators working > with > > > those external systems > > > - We have an extensive set of tests - both unit and integrations that > > > are sometimes really good and catching a lot of problems, but they can > only > > > do as much as mocking out access to the external systems. > > > Unit/integration tests are great for testing the core of Airflow and > it's > > > functionality but the external services cannot be effectively tested > > > - The externa services sometimes change - we have new versions of > tools, > > > services etc released every day and sometimes even if we perfectly > mock it > > > in unit tests - the hooks simply stop working at some point in time. > > > - I think there is a need to run some tests on a systems level > regularly > > > - communicating with "real" external systems and testing our operators, > > > Let's call them System Tests. They do not necessarily need to be run > with > > > every PR, but I think running them regularly makes perfect sense. > > > > > > > > > *Why now? Why this seems to be a good time to do it?* > > > - We switched to pytests and we already have separation to > > > unit/integration tests in place - we can add support to system tests > using > > > the same mechanisms. > > > - With AIP-21 we grouped the tests into "providers" package and that > > > makes it easy to define boundaries of "systems" - every provider is a > > > "system" to test. > > > - We have plenty of system tests implemented for GCP which we are going > > > to use to run tests for backported packages from AIP-21 - we followed > > > system test automation for more than a year in GCP operators and we > have it > > > fully automated already. > > > - In the latest PR - https://github.com/apache/airflow/pull/7389 we > even > > > extracted all the GCP-specific way we run system tests in the way to a) > > > make it easy for everyone to write automated system tests b) make it > > > possible to be automated. > > > - We have credits provided by Google to run our tests and we can use > > > them for regular runs of the system tests > > > - We are close to switch-over to GitHub Actions, which will make it > easy > > > to write manually or regularly scheduled actions that will have > securely > > > stored credentials to run the system tests - in a way that it will be > > > controlled by committers and not abusable by contributors who prepare > PRs. > > > - I would like to start and lead a community-driven effort where we > will > > > split amongst community members writing missing tests - so that our new > > > backport packages can be tested against latest-released version of > 1.10.*. > > > We will provide GCP tests as examples, we will also setup the > automation > > > needed to run the tests regularly - the only thing we will ask the > members > > > of the community is to write missing tests. This way I hope we can get > very > > > high coverage of backported packages. > > > > > > There are of course still a number of open questions - like how to > store > > > credentials, how often to run the tests etc. but I think those are > > > implementation details that we can work out while we are implementing > it. > > > > > > What do you think about it? If I have a lot of "yes's" quickly, I would > > > love to start voting on AIP-4. > > > > > > J. > > > > > > > > > > > > > > > > > > > > > -- > > > Jarek Potiuk > > > Polidea <https://www.polidea.com/> | Principal Software Engineer > > > > > > M: +48 660 796 129 <+48660796129> > > > [image: Polidea] <https://www.polidea.com/> > > > > > >