potiuk commented on issue #4543: [AIRFLOW-3718] [WIP] Multi-layered version of the docker image URL: https://github.com/apache/airflow/pull/4543#issuecomment-471476833 Just rebased the change and squashed into single commit now. Some further progress @ashb @fokko: ## Tests / Dockerfile / Travis CI status I am fixing one-by-one the failing tests: * Most postgres/sqlite tests work * mysql tests do not start because of hostname problem in docker compose (looking at it)\ * Also some other tests are failing: Hadoop tests do not work (most likely some variable not passed properly) * Python virtualenv tests do not pass yet (missing virtualenv in the CI image - I will add it) * I changed the Dockerfile to always use `airflow` user to install airflow as it was in the original airflow-CI image - this solved a number of failing tests as well (SFTP mainly as it was trying to ssh to airflow user) * I am also using dumb-init now (was easier to install via PIP rather than installing tini) so that when we start webserver/scheduler inside the docker - they will get TERM signals properly propagated to them on stopping the docker (as per our separate discussion) ## Local testing environment - equivalent to Travis CI Actually in order to fix the tests I had to make it easy to run the tests locally. Therefore I ported a lot of good ideas from the ["Airflow Breeze" environment](https://github.com/PolideaInternal/airflow-breeze/blob/master/README.md). We used it for last half-year and I took the best and most relevant parts of it and added a very friendly "./run_it_environment.sh" script that manages locally the enviroment parallel to what we have in Travis CI: * It spins off the same environment as we have in Travis but it runs locally and is development/local test friendly (for example it maps local sources directly to the `airflow-testing` docker container). * It also manages the whole lifecycle of the docker images - it will pull/rebuild the image as needed (but it also allows to enter the environment super-fast (under 2 seconds) with -i flag once you have the docker images built). * It is rather user-friendly: `./run_it_environment.sh --python 3.6 --backend mysql --env docker` will do exactly what you expect it to do. It will drop you in the shell, with docker-compose environment and you can simply do `./run_tests.sh ....` to run the tests (including all integration tests requiring hadoop/mysql/postgres whatever). * You can also use `./run_it_environment.sh` to run tests automatically instead of entering interactive bash (you can to specify `--test-target` in the command line). * You will be able (that's not ported yet :)) to forward local ports on your host to webserver/postgres/mysql databases run in your environment - you will be able to connect to theme with your local Browser/database management/browsing tools. * You can see all the current command line switches here: https://github.com/PolideaInternal/airflow/blob/multi-layered-dockerfile/CONTRIBUTING.md#run-it-environment-flags * I have not looked yet at kubernetes tests - but it might be easier to fix them by adding another docker-compose image rather than using the current minikube approach (as suggested by AIP-7) - I might actually be able to get some community support for that maybe (see below) This `./run_it_environment.sh` seems to address as well most of the AIP-7 ("Simplified development workflow") - so we can kill two birds with the same stone. I went ahead and updated the [CONTRIBUTING doc](https://github.com/PolideaInternal/airflow/blob/multi-layered-dockerfile/CONTRIBUTING.md#setting-up-a-development-environment) in my branch - I made a clean distinction between the three environments we have (Virtualenv, Docker container, Docker Compose base integration test environment). I also explained advantages/disadvantages of each environment - I think it is not clear especially for first time contributors which environment they should use in which case. ## Involving community I think it's about the highest time to involve community more and get a bit more feedback (especially in relation to AIP-7) even before I fix all the tests. I did some restructuring of CI scripts to make them more obvious/consistent, and I would like to keep it running as a PR with Travis tests for some time (continuously rebasing it) and seeing if the build times/test results are as I expect it to be. I can continue doing that using my private DockerHub/Travis CI and share the results with the community. What I want to do as well - before I involve the community - I want to update AIP-10 ("Multi-layered docker file") with a bit more explanation (and actual design :) ) of how the current Multi-staging Dockerfile is constructed, what images will be in the DockerHub registry, and how it will work for different cases including caching/wheel behaviour (it works slightly differently for DockerHub build, Travis Build, and local build in `./run_it_environment.sh`. There are different optimisations implemented for those different cases to minimise build times. Once I involve community, maybe I will be able to get some help especially in making the kubernetes tests work properly - it was suggested in AIP-7 that we should do it with moving minikube to be separate docker-compose image and I think it should be fairly easy. Looking forward to your comments :) What do you think?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
