potiuk commented on a change in pull request #4938: [AIRFLOW-4117]
Multi-staging Image - Travis CI tests [Step 3/3]
URL: https://github.com/apache/airflow/pull/4938#discussion_r298820154
##########
File path: Dockerfile
##########
@@ -278,42 +278,75 @@ RUN echo "Pip version: ${PIP_VERSION}"
RUN pip install --upgrade pip==${PIP_VERSION}
-# We are copying everything with airflow:airflow user:group even if we use
root to run the scripts
+ARG AIRFLOW_REPO=apache/airflow
+ENV AIRFLOW_REPO=${AIRFLOW_REPO}
+
+ARG AIRFLOW_BRANCH=master
+ENV AIRFLOW_BRANCH=${AIRFLOW_BRANCH}
+
+ENV
AIRFLOW_GITHUB_DOWNLOAD=https://raw.githubusercontent.com/${AIRFLOW_REPO}/${AIRFLOW_BRANCH}
+
+# We perform fresh dependency install at the beginning of each month from the
scratch
+# This way every month we re-test if fresh installation from the scratch
actually works
+# As opposed to incremental installations which does not upgrade already
installed packages unless it
+# is required by setup.py constraints.
+ARG BUILD_MONTH
+
+# We get Airflow dependencies (no Airflow sources) from the master version of
Airflow in order to avoid full
+# pip install layer cache invalidation when setup.py changes. This can be
reinstalled from the
+# latest master by increasing PIP_DEPENDENCIES_EPOCH_NUMBER.
+RUN mkdir -pv ${AIRFLOW_SOURCES}/airflow/bin \
+ && curl -L ${AIRFLOW_GITHUB_DOWNLOAD}/setup.py >${AIRFLOW_SOURCES}/setup.py \
+ && curl -L ${AIRFLOW_GITHUB_DOWNLOAD}/setup.cfg >${AIRFLOW_SOURCES}/setup.cfg
\
Review comment:
You only need to build the docker image once and then this layer will be
cached by docker. as long as BUILD_MONTH remains the same Docker will not
invalidate this layer (It will not check if the master have changed or anything
like that).
If you simply run 'docker build .' then BUILD_MONTH will be always empty and
the build will remain in the cache and never try to download stuff from github.
If you use './hooks/build' , the BUILD_MONTH arg will change every month and
every month it will re-download setup.py/setup/cfg etc. from master and
reinstall all dependencies from the scratch. For that you will need to have
access to internet (basically you need it for pip install anyway) - but it will
only trigger when you actually run the build.
In most cases - if you make a change to setup.py you will have to run the
build and then it will use cache until the COPY -- line below - where it will
see that setup.py changed and it will run 'pip install' right after that. But
then - you have to have access to internet to run pip install anyway. And again
- it will only be needed if you run docker build which should not happen often
(and you do not need it except if you rebase with other changes or change
setup.py). Sources from your host will be mounted inside docker image so there
is no need to rebuild it for local development at all.
However indeed decision on when to run docker build is not obvious and even
checking if cache layer is invalidated can take some time so I actually
address it in the #4932 which I will shortly open for proposal/voting. This
is 'breeze' development environment based on this new image. There I
implemented additional checking of md5sum on crucial files (like setup.py or
Dockerfile) and when entering the environment with `breeze` command I will warn
the user that the local environment needs updating (and the user will have
option to rebuild or continue without it) - this way - once you run `breeze` at
least once, you can continue using it locally without any internet access and
you will get warned when you need to update.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services