potiuk commented on issue #4543: [AIRFLOW-3718] [WIP] Multi-layered version of 
the docker image
URL: https://github.com/apache/airflow/pull/4543#issuecomment-471476833
 
 
   Just rebased the change and squashed into single commit now. 
   Some  further progress @ashb @fokko:
   
   ## Tests / Dockerfile / Travis CI status
   
   I am fixing one-by-one the failing tests: 
   * Most postgres/sqlite tests work
   * mysql tests do not start because of hostname problem in docker compose 
(looking at it)\
   * Also some other tests are failing: Hadoop tests do not work (most likely 
some variable not passed properly)
   * Python virtualenv tests do not pass yet (missing virtualenv in the CI 
image - I will add it)
   * I changed the Dockerfile to always use `airflow` user to install airflow 
as it was in the original airflow-CI image - this solved a number of failing 
tests as well (SFTP mainly as it was trying to ssh to airflow user)
   * I am also using dumb-init now (was easier to install via PIP rather than 
installing tini) so that when we start webserver/scheduler inside the docker - 
they will get TERM signals properly propagated to them on stopping the docker 
(as per our separate discussion)
   
   ## Local testing environment - equivalent to Travis CI
   
   Actually in order to fix the tests I had to make it easy to run the tests 
locally. Therefore I ported a lot of  good ideas from the ["Airflow Breeze" 
environment](https://github.com/PolideaInternal/airflow-breeze/blob/master/README.md).
  We used it for last half-year and I took the best and most relevant parts of 
it and added a very friendly "./run_it_environment.sh" script that manages 
locally the enviroment parallel to what we have in Travis CI:
   
   * It spins off the same environment as we have in Travis but it runs locally 
and is development/local test friendly (for example it maps local sources 
directly to the `airflow-testing` docker container).
   * It also manages the whole lifecycle of the docker images - it will 
pull/rebuild the image as needed (but it also allows to enter the environment 
super-fast (under 2 seconds) with -i flag once you have the docker images 
built). 
   * It is rather user-friendly: `./run_it_environment.sh --python 3.6 
--backend mysql --env docker` will do exactly what you expect it to do. It will 
drop you in the shell, with docker-compose environment and you can simply do 
`./run_tests.sh ....` to run the tests (including all integration tests 
requiring hadoop/mysql/postgres whatever).
   * You can also use `./run_it_environment.sh` to run tests automatically 
instead of entering interactive bash (you can to specify `--test-target` in the 
command line).
   * You will be able (that's not ported yet :)) to forward local ports on your 
host to webserver/postgres/mysql databases run in your environment - you will 
be able to connect to theme with your local Browser/database 
management/browsing tools.
   * You can see all the current command line switches here: 
https://github.com/PolideaInternal/airflow/blob/multi-layered-dockerfile/CONTRIBUTING.md#run-it-environment-flags
   * I have not looked yet at kubernetes tests - but it might be easier to fix 
them by adding another docker-compose image rather than using the current 
minikube approach (as suggested by AIP-7) - I might actually be able to get 
some community support for that maybe (see below)
   
   This `./run_it_environment.sh` seems to address as well most of the AIP-7 
("Simplified development workflow") - so we can kill two birds with the same 
stone. I went ahead and updated the [CONTRIBUTING 
doc](https://github.com/PolideaInternal/airflow/blob/multi-layered-dockerfile/CONTRIBUTING.md#setting-up-a-development-environment)
 in my branch - I made a clean distinction between the three environments we 
have (Virtualenv, Docker container, Docker Compose base integration test 
environment). I also explained advantages/disadvantages of each environment - I 
think it is not clear especially for first time contributors which environment 
they should use in which case.
   
   ## Involving community
   
   I think it's about the highest time to involve community more and get a bit 
more feedback (especially in relation to AIP-7) even before I fix all the 
tests. I did some restructuring of CI scripts to make them more 
obvious/consistent, and I would like to keep it running as a PR with Travis 
tests for some time (continuously rebasing it) and seeing if the build 
times/test results are as I expect it to be. I can continue doing that using my 
private DockerHub/Travis CI and share the results with the community.
   
   What I want to do as well - before I involve the community - I want to 
update AIP-10 ("Multi-layered docker file") with a bit more explanation (and 
actual design :) ) of how the current Multi-staging Dockerfile is constructed, 
what images will be in the DockerHub registry, and how it will work for 
different cases including caching/wheel behaviour (it works slightly 
differently for DockerHub build, Travis Build, and local build in 
`./run_it_environment.sh`. There are different optimisations implemented for 
those different cases to minimise build times. 
   
   Once I involve community, maybe I will be able to get some help especially 
in making the kubernetes tests work properly - it was suggested in AIP-7 that 
we should do it with moving minikube to be separate docker-compose image and I 
think it should be fairly easy.
   
   Looking forward to your comments :) What do you think? 
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to