potiuk commented on issue #4543: [AIRFLOW-3718] [WIP] Multi-layered version of 
the docker image
URL: https://github.com/apache/airflow/pull/4543#issuecomment-471200981
 
 
   @ashb @Fokko -> Still WIP, but I squashed it all to one commit. And i did 
the "refactor" phase already. I managed to work out a way of putting all the 
Docker code into single Dockerfile. It's now quite big file but it's nicely, 
logically separated and uses multi-staging of Dockerfiles a lot! 
   
   I finally learned how to use multi-staging properly :D - something that I 
wanted to do for a long time. 
   
   This is really nice I think and looks pretty manageable long term. This is 
what we will have now:
   
   * Anyone can get airflow image by just running `docker build .`. No magic 
scripts involved :). It will build airflow image - equivalent to what we had 
before (and of similar size) but multi-layered. It will not be as 
cache-efficient as other methods (it will only use local cache from your 
previous local builds). You can use it to build airflow from the scratch 
without depending on our airflow/* images from DockerHub.
   
   * You can run ./local_docker_build.sh. This will make proper "cached" build 
- it will automatically pull images from DockerHub so that you do not have wait 
for all the dependencies to be installed and it will use those as cache as much 
as possible (i.e. if setup.py did not change, it will only add sources etc.). 
One nice thing with this local_docker_build.sh is that it will use wheel cache 
(similarly as CI image) so even if dependencies change, they will be 
reinstalled very quickly when you build it.
   
   * On CI (Travis) also runs the ./local_docker_build.sh to prepare image for 
testing. By default it will skip building main airflow image and wheel cache. 
It will only build CI image (which contains more dependencies and uses wheel 
cache to perform fast rebuilds). This way we save time on Travis to prepare 
image for tests. Also (for now) I configured the build to not reinstall PIP 
dependencies when setup.py does not change for CI builds. This way we can get 
what I was talking about for a long time - stable set of pip-installed 
dependencies on Travis (it will be refreshed automatically every time setup.py 
changes). This means that the dreadful transient dependency changes will only 
become a problem when someone changes setup.py.
   
   * the smaller (no wheel cache) main Airflow image (Also CI + wheels + 
dependencies) will be rebuild on DockerHub from master after merge and pushed 
there. It will be also efficiently using caced images from previous DockerHub 
build (following the multi-layer architecture). This way DockerHub builds will 
eventually catch-up with latest master-commits and what you will get in 
DockerHub will be eventually freshest version (this way we can also check if 
latest set of dependencies can be installed). This way we can immediately 
detect if there are some transient dependency problems and fix it, but at the 
same time it will not impact CI builds using latest cached image.
   
   * We can also setup a separate build which uses NO DOCKER CACHE whatsoever 
(by simply setting variable AIRFLOW_CONTAINER_USE_DOCKER_CACHE to false). It 
will then build everything from the scratch. This could be our safety-net to 
catch any case where any of the transient dependencies causes a problem which 
is hidden by caching.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to