potiuk commented on a change in pull request #4938: [AIRFLOW-4117]
Multi-staging Image - Travis CI tests [Step 3/3]
URL: https://github.com/apache/airflow/pull/4938#discussion_r299441892
##########
File path: Dockerfile
##########
@@ -278,42 +278,75 @@ RUN echo "Pip version: ${PIP_VERSION}"
RUN pip install --upgrade pip==${PIP_VERSION}
-# We are copying everything with airflow:airflow user:group even if we use
root to run the scripts
+ARG AIRFLOW_REPO=apache/airflow
+ENV AIRFLOW_REPO=${AIRFLOW_REPO}
+
+ARG AIRFLOW_BRANCH=master
+ENV AIRFLOW_BRANCH=${AIRFLOW_BRANCH}
+
+ENV
AIRFLOW_GITHUB_DOWNLOAD=https://raw.githubusercontent.com/${AIRFLOW_REPO}/${AIRFLOW_BRANCH}
+
+# We perform fresh dependency install at the beginning of each month from the
scratch
+# This way every month we re-test if fresh installation from the scratch
actually works
+# As opposed to incremental installations which does not upgrade already
installed packages unless it
+# is required by setup.py constraints.
+ARG BUILD_MONTH
+
+# We get Airflow dependencies (no Airflow sources) from the master version of
Airflow in order to avoid full
+# pip install layer cache invalidation when setup.py changes. This can be
reinstalled from the
+# latest master by increasing PIP_DEPENDENCIES_EPOCH_NUMBER.
+RUN mkdir -pv ${AIRFLOW_SOURCES}/airflow/bin \
+ && curl -L ${AIRFLOW_GITHUB_DOWNLOAD}/setup.py >${AIRFLOW_SOURCES}/setup.py \
+ && curl -L ${AIRFLOW_GITHUB_DOWNLOAD}/setup.cfg >${AIRFLOW_SOURCES}/setup.cfg
\
Review comment:
It's not only for local development. It's also for CI builds. Just to
explain what's going on here:
The only way to move setup.py to docker context locally is to run COPY or
ADD operation. Whenever setup.py changes and is part of the COPY/ADD then it
means that Dockerfile gets invalidated from that point on.
The case for CI this pattern solves is when your PR adds a new dependency.
Without it - when you make A PR with dependency added in setup.py - the
COPY/ADD causes that whole 'pip install' layer gets invalidated and
re-installed from the scratch. This means that every job (pylint/mypy/tests
etc.) will have to first perform the whole `pip install` from the scratch
adding 4-5 minutes to the build time.
It is a bit strange pattern indeed. Simply because Docker build does not
support this pattern yet :(. Howewer this change is not for ever and
Docker/moby guys realise that. There are layer types (including cache
specifically foreseen for that kind of behaviour) in the upcoming buildkit:
https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/experimental.md#run---mounttypecache
- but it will likely take months to release stable version with buildkit that
we can use (but when they will, I am super happy to update the docker image to
follow it).
Also we cannot do the single line you mentioned. In order to do that we
would have to add `COPY .` (which we do eventually few lines below). Doing it
will make it even worse - every single change to airflow sources will
invalidate everything and will force 'pip install' to run from the scratch
basically with every commit. That's why I separately COPY setup.py, setup.cfg
bin/airflow first - because they do not change that often and they are needed
to run the first pip install.
I see your concern. While I understand it's not perfect solution, I think it
is really good for CI optimisations until we have buildkit. What I can do
instead - I can make those steps already controlled with
"OPTIMISE_FOR_CI_BUILD" argument and skip those steps entirely when it is not
set to "true" (which I will do in the CI/breeze/pylint/mypy cases). Then the
"non-CI-build-optimised" images will run just single "pip install" - the last
one in the Dockerfile :). Also that will make it obvious that this is an
optimisation rather than integral part of the "basic" Dockerfile.
Would that be a good solution ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services