potiuk commented on a change in pull request #19210:
URL: https://github.com/apache/airflow/pull/19210#discussion_r751158036
##########
File path: Dockerfile
##########
@@ -160,13 +160,21 @@ ARG INSTALL_PROVIDERS_FROM_SOURCES="false"
# But it also can be `.` from local installation or GitHub URL pointing to
specific branch or tag
# Of Airflow. Note That for local source installation you need to have local
sources of
# Airflow checked out together with the Dockerfile and AIRFLOW_SOURCES_FROM
and AIRFLOW_SOURCES_TO
-# set to "." and "/opt/airflow" respectively.
+# set to "." and "/opt/airflow" respectively. Similarly
AIRFLOW_SOURCES_WWW_FROM/TO are set to right source
+# and destination
ARG AIRFLOW_INSTALLATION_METHOD="apache-airflow"
# By default latest released version of airflow is installed (when empty) but
this value can be overridden
# and we can install version according to specification (For example ==2.0.2
or <3.0.0).
ARG AIRFLOW_VERSION_SPECIFICATION=""
# By default we do not upgrade to latest dependencies
ARG UPGRADE_TO_NEWER_DEPENDENCIES="false"
+# By default we install latest airflow from PyPI so we do not need to copy
sources of Airflow
+# www to compile the assets but in case of breeze/CI builds we use latest
sources and we override those
+# those SOURCES_FROM/TO with "airflow/wwww" and "/opt/airflow/airflow/wwww"
respectively.
+# This is to rebuild the assets only when any of the www sources change
+ARG AIRFLOW_SOURCES_WWW_FROM="empty"
+ARG AIRFLOW_SOURCES_WWW_TO="/empty"
Review comment:
The `empty' is as the solution for to build the image using either
released PyPI packages or local sources (which is optimised for iteration speed
with kind tests).
By default when prod docker image is built, you specify Airflow version
(2.2.2 with latest version) and the image is built using that version from
PyPI. This is case 1) of building the PROD image - and it does not matter at
all what local sources you have of airflow - always PyPI package is installed.
For testing and building image in the CI we need to use local sources - so
the PROD image in CI is built using locally prepared packages from sources. In
this case "airflow" and "provider" packages are built locally and copied to
"docker-context-files" - and installed from there rather than from PyPI. This
is case 2) of building the image.
However the very same Dockerfile is also used locally to test "kind" tests -
using local sources. We do not have packages in PYPI for those local sources,
and we do not want to rebuild the airflow package and copy it to
"docker-context-files", as it would mean a very slow iteration speed. If we use
packages then after every single source code change you'd have to rebuild the
packages, copy them to docker-context files, rebuild the imagae and upload it
to kind. This is terribly slow iteration speed because any change to
"docker-context-files" invalidates the layer when the packages are installed
and this means a lot of time lost for rebuilding the layer.
So instead of building airlfow from packages, for local kind testing we are
installing airflow from sources (similarly as in CI images). Instead of
preparing packages and copying them to docker-context files we simply COPY
"airflow" folder for the prod image and airflow is installed from those sources
in editable mode. In the future in this case we could even employ some
framework to mount the airflow folder to kind cluster to avoid rebuilding
altogether when the sources change (but that's future optimisation if needed.
That saves at least couple of minutes on every iteration of tests. This is case
3) of building the PROD image
In case 1) and 2) we do not want to copy airflow sources (it slows down the
docker build a lot and increases the size of the "build" segment a lot) - also
it means that this layer of the image would be invalidated every time airflow
sources change. So by default in those cases we copy"empty" directory - it's
super fast and will not invalidate if you change sources. Unfortunately
Dockerfile does not have "optional" steps - so if in case 3) we want to COPY
airflow sources, then in case 1) and 2) we also need to copy "something". I
impemented it in the way that "what" and "where" we copy is passed as args,
which works nicely - in 1) 2) i copy empty folder to /empty dir and in 3) we
copy "airflow/www" to "/opt/airlfow/airflow/www" and later "airflow" to
"/opt/airlfow/airflow". The optimisation above will do it in two steps in
order to optimise the case when no "www" changes so we save time on
invalidating layers and rebuilding the "yarn" stuff - saving another good 30
seconds (or mo
re if you have slow connection) on rebuilding this layer.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]