potiuk commented on issue #4543: [AIRFLOW-3718] [WIP] Multi-layered version of the docker image URL: https://github.com/apache/airflow/pull/4543#issuecomment-471200981 @ashb @Fokko -> Still WIP, but I squashed it all to one commit. And i did the "refactor" phase already. I managed to work out a way of putting all the Docker code into single Dockerfile. It's now quite big file but it's nicely, logically separated and uses multi-staging of Dockerfiles a lot! I finally learned how to use multi-staging properly :D - something that I wanted to do for a long time. This is really nice I think and looks pretty manageable long term. This is what we will have now: * Anyone can get airflow image by just running `docker build .`. No magic scripts involved :). It will build airflow image - equivalent to what we had before (and of similar size) but multi-layered. It will not be as cache-efficient as other methods (it will only use local cache from your previous local builds). You can use it to build airflow from the scratch without depending on our airflow/* images from DockerHub. * You can run ./local_docker_build.sh. This will make proper "cached" build - it will automatically pull images from DockerHub so that you do not have wait for all the dependencies to be installed and it will use those as cache as much as possible (i.e. if setup.py did not change, it will only add sources etc.). One nice thing with this local_docker_build.sh is that it will use wheel cache (similarly as CI image) so even if dependencies change, they will be reinstalled very quickly when you build it. * On CI (Travis) also runs the ./local_docker_build.sh to prepare image for testing. By default it will skip building main airflow image and wheel cache. It will only build CI image (which contains more dependencies and uses wheel cache to perform fast rebuilds). This way we save time on Travis to prepare image for tests. Also (for now) I configured the build to not reinstall PIP dependencies when setup.py does not change for CI builds. This way we can get what I was talking about for a long time - stable set of pip-installed dependencies on Travis (it will be refreshed automatically every time setup.py changes). This means that the dreadful transient dependency changes will only become a problem when someone changes setup.py. * the smaller (no wheel cache) main Airflow image (Also CI + wheels + dependencies) will be rebuild on DockerHub from master after merge and pushed there. It will be also efficiently using caced images from previous DockerHub build (following the multi-layer architecture). This way DockerHub builds will eventually catch-up with latest master-commits and what you will get in DockerHub will be eventually freshest version (this way we can also check if latest set of dependencies can be installed). This way we can immediately detect if there are some transient dependency problems and fix it, but at the same time it will not impact CI builds using latest cached image. * We can also setup a separate build which uses NO DOCKER CACHE whatsoever (by simply setting variable AIRFLOW_CONTAINER_USE_DOCKER_CACHE to false). It will then build everything from the scratch. This could be our safety-net to catch any case where any of the transient dependencies causes a problem which is hidden by caching.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
