One more comment: I also updated the AIP-4 proposal: Support for System Tests for external systems <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-4+Support+for+System+Tests+for+external+systems> that you referred to, to better reflect of what we have done so far for GCP operators.
Side comment: We are using the automated System Tests (that's the name we found is better than Integration Tests) for quite some time now for our GCP development and we found it super useful to detect some obscure errors (usually related to Python version incompatibilities) before they hit someone else. Here are some example bugs we detected and mostly fixed thanks to that: AIRFLOW-3615 <https://issues.apache.org/jira/browse/AIRFLOW-3615> - AIRFLOW-3527 <https://issues.apache.org/jira/browse/AIRFLOW-3527> - AIRFLOW-3416 <https://issues.apache.org/jira/browse/AIRFLOW-3416>, AIRFLOW-3263 <https://issues.apache.org/jira/browse/AIRFLOW-3263> (but there were many more that never made it to JIRA because we detected it before the code was merged) J. On Thu, Jan 3, 2019 at 11:14 AM Jarek Potiuk <jarek.pot...@polidea.com> wrote: > Hello everyone, > > I am really, really happy to help with that as it has been focus of my > attention for the last couple of months in our team at Polidea. > Maybe we can use what we have done for our own development environment for > Airflow for Google Cloud Platform. > > We are ready to share what we have done and contribute to Apache in > whatever form is appropriate. Either incorporating parts of what we've done > or (possibly) using what we've done as starting point and adding what's > missing from the current TravisCI setup. I think the latter will be far > easier and faster - but it's just my opinion as I know it very well now :). > > Last few months in Polidea (my company) we developed (and contributed to > Airflow's contrib) more than 30 Google Cloud Platform related operators and > a number of bugfixes to the core Airflow. We worked as a team (3 people) > and we created pretty complete and sophisticated, very well documented > development environment to be more productive and to work as a team. We are > going to add 40 more operators and add new team members in the coming > months so we had to be productive :). > > You can find our environment here: > https://github.com/PolideaInternal/airflow-breeze - we call it '*Airflow > breeze*' like in *"it's a breeze to work with Aiflow and GCP"*. t's > targeted to make our work easier for Google Cloud Platform operators > development but it has many things implemented that you are talking about: > > *Supported features:* > > - Simplified, nicely layered and optimised for speed (especially > cassandra driver) of building Dockerfile > <https://github.com/PolideaInternal/airflow-breeze/blob/master/Dockerfile> > that supports three python versions - 2.7, 3.5 (used in Google Composer) > and 3.6. Note that there are many problems with compatibility between 3.5 > and 3.6 so we introduced all three versions. > - Google Cloud Build CI scripts for cloud build are already part of > the image (similarly as suggested for Travis CI ones). > - We dropped *tox* support in favour of Google Cloud Build parallel > builds with separate docker containers. > - We have a built-in support for unique naming of resources so that > multiple builds > - We have automation of local environment (virtualenvs) for running > some unit and system tests locally - not only via docker container (which > makes it far easier for debugging) - for example using local IDE > - Documentation how to work with unit tests > > <https://github.com/PolideaInternal/airflow-breeze/blob/master/README.unittests.md> > and system tests > > <https://github.com/PolideaInternal/airflow-breeze/blob/master/README.unittests.md> > (see > below for system tests description) - including description on how to > integrate with IntelliJ/Pycharm and work efficiently with debugging - > including remote debugging of environment (includes some screenshots). > - Support for automated Cloud Build and system tests > - Nice, documented ./run_environment.sh > > <https://github.com/PolideaInternal/airflow-breeze#appendix-current-run_environment-flags> > script > that supports image building/uplod/download from registry, choosing GCP > project id and Service account keys, support for multiple workspaces, > - Prerequisites, setting up and bootstrapping the local project frpm > scratch > > <https://github.com/PolideaInternal/airflow-breeze/blob/master/README.setup.md> > - > documentation + automation of checkout of the project and shared team > configuration - that includes documentation on how to configure your local > virtualenvs and manage docker image and the whole environment > - The Dockerfile and ./run_environment.sh is built in the way that > local sources are shared with the Docker container so you can edit your > sources while running the tests in the container. Super helpful for fast > development cycle. > - A number of nice development nice small features - such as bash > history support in docker, automated setting of common configuration > variables shared between the team etc > > *What's missing:* > > - What is missing comparing to the current Travis CI is docker compose > to support external dependencies (mysql etc.) - this does not play well > with Google Cloud Build with their docker-in-docker approach but if we run > in Travis CI this should be perfectly fine to run the airflow-breeze image > there through docker compose, or it might turn easier to install mysql > within the image itself rather than docker compose - it will make it much > easier to multiply docker instances and run them in paralel. In our > environment we start Postgres DB in docker and run all system tests using > local executor + Postgres and it's super easy to run tests on multiple > environments this way even running them on the same machine (this will be > more complex with docker compose) > - Also Breeze is closely tied with Google Cloud for Cloud Build - but > we can, fairly easily make it an optional component. We also have not > focused on Kubernetes workers but as I understand we want to go to GKE - > which would make it even better as we will need Google Cloud Platform > integration baked in - and we already have it and we could use the same > mechanisms. We can also leverage our contacts with Google team and maybe we > can ask Google to donate some recurring credits to make a shared Google > Cloud Platform project so that we can have a shared Airflow GCP project to > integrate everything there. > > > *Some more information about Airflow Breeze's Cloud Build support and > System Tests. * > > We have a design doc > <https://docs.google.com/document/d/15hdqL4bWU0646nAvxsEjIEr0gHOhMu6OByDWI1oiE7w/edit?usp=drive_web&ouid=112320280470690058978> > that > describes the whole environment. A number of things there are GCP related - > we have integration with Google Cloud services (Cloud Build, Functions, > PubSub, Repositories) to run our automated System Tests. One interesting > feature of Airflow Breeze's is to be able to easily configure and run > System Tests with Google Cloud Platform ( > https://github.com/PolideaInternal/airflow-breeze/blob/master/README.systemtests.md). > We also have really nice Slack notifications > <https://github.com/PolideaInternal/airflow-breeze/blob/master/images/slack_notification.png> > after build is complete + automated summary > <https://storage.googleapis.com/polidea-airflow-builds/6ed0e876-2fe3-41b4-90d0-4fa839901085/index.html> > showing result of automated system tests + automatically generated > documentation > <https://storage.googleapis.com/polidea-airflow-builds/6ed0e876-2fe3-41b4-90d0-4fa839901085/docs/index.html> > + logs from system tests > <https://console.cloud.google.com/storage/browser/polidea-airflow-builds/6ed0e876-2fe3-41b4-90d0-4fa839901085/logs?project=polidea-airflow>. > We do not aim for it to replace Travis CI (which we also run) - it's > complementary to Travis and it runs only relevant GCP unit tests and System > Tests with the real GCP project of ours. > > I initially described our intentions in AIP-4 > <https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-4+Support+for+Integration+Tests> > (which > is also mentioned in AIP=7) - but I will soon change AIP-4 description to > match what we've actually develop for our own usage - which is GCP-specific > and not aimed to replace the Travis CI testing. > > *Few more words about Google Cloud Platform integration of Airflow Breeze* > > Currently it is implemented in this way that each team can have it's own > Google Project ID to work on (or even several projects because we support > multiple workspaces) and we have the way to easily bootstrap the project in > the GCP project from the scratch - that includes automated setup of all the > required permissions, service accounts, service APIs, creating and filling > test buckets, preparing Google Cloud Build triggers and so one - so > literally in 20 minutes you can have a new GCP project up and running - > ready to run your system tests. > > I would be supper happy if we can contribute what we've done there. > Currently we have some very small commit that we cherry-pick in our > branches to be able to use Automated Cloud Build (namely cloudbuild.yaml > file - similar to .travis.yml) but if we can modify it and make it part of > the main Apache project - we would be more than happy to do it! > > Let me know what you think ! > > J. > > > On Wed, Jan 2, 2019 at 10:57 PM Daniel Imberman <daniel.imber...@gmail.com> > wrote: > >> Hi guys, I've set up a few sub-projects for this. @gerardo @fokko Lemme >> know what you guys think >> >> >> https://cwiki.apache.org/confluence/display/AIRFLOW/Optimizing+Docker+Image+Workflow >> >> https://cwiki.apache.org/confluence/display/AIRFLOW/Kubernetes+Testing%3A+Using+GKE+instead+of+Minikube >> >> On Tue, Jan 1, 2019 at 11:45 PM Driesprong, Fokko <fo...@driesprong.frl> >> wrote: >> >> > Hi Gerardo, >> > >> > Very valid points. I'm fully in favor of your proposal. To simplify the >> > stack, I strongly believe we should also strip out tox and fully rely on >> > Docker. Using tox will add another layer that doesn't add a lot of value >> > from my perspective. Also, we should bake all the *.sh bootstrap scripts >> > <https://github.com/apache/incubator-airflow/tree/master/scripts/ci> in >> > the >> > Docker container, instead of having to set this up before running the >> > tests. >> > >> > In the upcoming months, I might have a bit more time to spend on >> Airflow, >> > I'm happy to assist you on this one. >> > >> > Cheers, Fokko >> > >> > Op wo 2 jan. 2019 om 06:51 schreef Daniel Imberman < >> > daniel.imber...@gmail.com>: >> > >> > > @gerardo thank you for setting this up. >> > > >> > > I've also been extremely interested in this as well. I've been messing >> > with >> > > GCP VM instances in the past few weeks to try to simplify my local >> build >> > as >> > > well. Would definitely be interested in helping with the AIP + >> > > implementation. >> > > >> > > One thing I believe we should do is set up the ci base-image with all >> of >> > > the pip dependencies pre-loaded. A lot of time is wasted pip >> installing >> > > dependencies. We can auto-generate new images whenever a PR is >> submitted >> > to >> > > this repository and then specify the tag in the .travis.yml when >> > building. >> > > >> > > On the k8s side, I think we need to move away from minikube for k8s >> > > testing. I discussed in a previous email setting travis to work with >> GKE. >> > > I'd be careful about coupling k8s stuff too tightly with a docker >> > > infrastructure. That can get pretty dicey. I think as long as we're >> > using a >> > > separate k8s cluster the k8s executor tests only need to gather the IP >> > > addresses + have access to the kubeconfig. >> > > >> > > >> > > On Tue, Jan 1, 2019 at 8:10 PM Gerardo Curiel <gera...@gerar.do> >> wrote: >> > > >> > > > Hi folks, >> > > > >> > > > I've created an AIP for simplifying Airflow's development workflow: >> > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-7+Simplified+development+workflow >> > > > >> > > > The goal of this proposal is to outline the work needed to make >> local >> > > > testing significantly easier and standardise the best practices to >> > > > contribute to the Airflow project. >> > > > >> > > > Any input on it would be greatly appreciated. >> > > > >> > > > Cheers, >> > > > >> > > > -- >> > > > Gerardo Curiel // https://gerar.do >> > > >> > > >> > > On Tue, Jan 1, 2019 at 8:10 PM Gerardo Curiel <gera...@gerar.do> >> wrote: >> > > >> > > > Hi folks, >> > > > >> > > > I've created an AIP for simplifying Airflow's development workflow: >> > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-7+Simplified+development+workflow >> > > > >> > > > The goal of this proposal is to outline the work needed to make >> local >> > > > testing significantly easier and standardise the best practices to >> > > > contribute to the Airflow project. >> > > > >> > > > Any input on it would be greatly appreciated. >> > > > >> > > > Cheers, >> > > > >> > > > -- >> > > > Gerardo Curiel // https://gerar.do >> > > > >> > > >> > >> > > > -- > > Jarek Potiuk > Polidea <https://www.polidea.com/> | Principal Software Engineer > > M: +48 660 796 129 <+48660796129> > E: jarek.pot...@polidea.com > [image: Polidea] <https://www.polidea.com/> > > We create human & business stories through technology. > Check out our projects! <https://www.polidea.com/our-work> > [image: Github] <https://github.com/Polidea> [image: Facebook] > <https://www.facebook.com/Polidea.Software> [image: Twitter] > <https://twitter.com/polidea> [image: Linkedin] > <https://www.linkedin.com/company/polidea> [image: Instagram] > <https://instagram.com/polidea> [image: Behance] > <https://www.behance.net/polidea> > -- Jarek Potiuk Polidea <https://www.polidea.com/> | Principal Software Engineer M: +48 660 796 129 <+48660796129> E: jarek.pot...@polidea.com [image: Polidea] <https://www.polidea.com/> We create human & business stories through technology. Check out our projects! <https://www.polidea.com/our-work> [image: Github] <https://github.com/Polidea> [image: Facebook] <https://www.facebook.com/Polidea.Software> [image: Twitter] <https://twitter.com/polidea> [image: Linkedin] <https://www.linkedin.com/company/polidea> [image: Instagram] <https://instagram.com/polidea> [image: Behance] <https://www.behance.net/polidea>