potiuk commented on a change in pull request #19210:
URL: https://github.com/apache/airflow/pull/19210#discussion_r751158036



##########
File path: Dockerfile
##########
@@ -160,13 +160,21 @@ ARG INSTALL_PROVIDERS_FROM_SOURCES="false"
 # But it also can be `.` from local installation or GitHub URL pointing to 
specific branch or tag
 # Of Airflow. Note That for local source installation you need to have local 
sources of
 # Airflow checked out together with the Dockerfile and AIRFLOW_SOURCES_FROM 
and AIRFLOW_SOURCES_TO
-# set to "." and "/opt/airflow" respectively.
+# set to "." and "/opt/airflow" respectively. Similarly 
AIRFLOW_SOURCES_WWW_FROM/TO are set to right source
+# and destination
 ARG AIRFLOW_INSTALLATION_METHOD="apache-airflow"
 # By default latest released version of airflow is installed (when empty) but 
this value can be overridden
 # and we can install version according to specification (For example ==2.0.2 
or <3.0.0).
 ARG AIRFLOW_VERSION_SPECIFICATION=""
 # By default we do not upgrade to latest dependencies
 ARG UPGRADE_TO_NEWER_DEPENDENCIES="false"
+# By default we install latest airflow from PyPI so we do not need to copy 
sources of Airflow
+# www to compile the assets but in case of breeze/CI builds we use latest 
sources and we override those
+# those SOURCES_FROM/TO with "airflow/wwww" and "/opt/airflow/airflow/wwww" 
respectively.
+# This is to rebuild the assets only when any of the www sources change
+ARG AIRFLOW_SOURCES_WWW_FROM="empty"
+ARG AIRFLOW_SOURCES_WWW_TO="/empty"

Review comment:
       The `empty' is as the solution for to build the image using either 
released PyPI packages or local sources (which is optimised for iteration speed 
with kind tests).
   
   By default when prod docker image is built, you specify Airflow version 
(2.2.2 with latest version) and the image is built using that version from 
PyPI. This is case 1) of building the PROD image - and it does not matter at 
all what local sources you have of airflow - always PyPI package is installed. 
   
   For testing and building image in the CI we need to use local sources - so 
the PROD image in CI is built using locally prepared packages from sources. In 
this case "airflow" and "provider" packages are built locally and copied to 
"docker-context-files" - and installed from there rather than from PyPI. This 
is case 2) of building the image.
   
   However the very same Dockerfile is also used locally to test "kind" tests - 
using local sources. We do not have packages in PYPI for those local sources, 
and we do not want to rebuild the airflow package and copy it to 
"docker-context-files", as it would mean a very slow iteration speed. If we use 
packages then after every single source code change  you'd have to rebuild the 
packages, copy them to docker-context files, rebuild the imagae and upload it 
to kind. This is terribly slow iteration speed because any change to 
"docker-context-files" invalidates the layer when the packages are installed 
and this means a lot of time lost for rebuilding the layer. 
   So instead of building airlfow from packages, for local kind testing we are 
installing airflow from sources (similarly as in CI images). Instead of 
preparing packages and copying them to docker-context files we simply COPY 
"airflow" folder for the prod image and airflow is installed from those sources 
in editable mode. In the future in this case we could even employ some 
framework to mount the airflow folder to kind cluster to avoid rebuilding 
altogether when the sources change (but that's future optimisation if needed. 
That saves at least couple of minutes on every iteration of tests. This is case 
3) of building the PROD  image
   
   In case 1) and 2) we do not want to copy airflow sources (it slows down the 
docker build a lot and increases the size of the "build" segment a lot) - also 
it means that this layer of the image would be invalidated every time airflow 
sources change. So by default in those cases we copy"empty" directory - it's 
super fast and will not invalidate if you change sources. Unfortunately 
Dockerfile does not have "optional" steps - so if in case 3) we want to COPY 
airflow sources, then in case 1) and 2) we also need to copy "something". I 
impemented it in the way that "what" and "where" we copy is passed as args, 
which works nicely - in 1) 2) i copy empty folder to /empty dir and in 3) we 
copy "airflow/www" to "/opt/airlfow/airflow/www" and later "airflow"  to 
"/opt/airlfow/airflow".  The optimisation above will do it in two steps in 
order to optimise the case when no "www" changes so we save time on 
invalidating layers and rebuilding the "yarn" stuff - saving another good 30 
seconds (or mo
 re if you have slow connection) on rebuilding this layer.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to