ashb commented on a change in pull request #4938: [AIRFLOW-4117] Multi-staging
Image - Travis CI tests [Step 3/3]
URL: https://github.com/apache/airflow/pull/4938#discussion_r299352960
##########
File path: Dockerfile
##########
@@ -278,42 +278,75 @@ RUN echo "Pip version: ${PIP_VERSION}"
RUN pip install --upgrade pip==${PIP_VERSION}
-# We are copying everything with airflow:airflow user:group even if we use
root to run the scripts
+ARG AIRFLOW_REPO=apache/airflow
+ENV AIRFLOW_REPO=${AIRFLOW_REPO}
+
+ARG AIRFLOW_BRANCH=master
+ENV AIRFLOW_BRANCH=${AIRFLOW_BRANCH}
+
+ENV
AIRFLOW_GITHUB_DOWNLOAD=https://raw.githubusercontent.com/${AIRFLOW_REPO}/${AIRFLOW_BRANCH}
+
+# We perform fresh dependency install at the beginning of each month from the
scratch
+# This way every month we re-test if fresh installation from the scratch
actually works
+# As opposed to incremental installations which does not upgrade already
installed packages unless it
+# is required by setup.py constraints.
+ARG BUILD_MONTH
+
+# We get Airflow dependencies (no Airflow sources) from the master version of
Airflow in order to avoid full
+# pip install layer cache invalidation when setup.py changes. This can be
reinstalled from the
+# latest master by increasing PIP_DEPENDENCIES_EPOCH_NUMBER.
+RUN mkdir -pv ${AIRFLOW_SOURCES}/airflow/bin \
+ && curl -L ${AIRFLOW_GITHUB_DOWNLOAD}/setup.py >${AIRFLOW_SOURCES}/setup.py \
+ && curl -L ${AIRFLOW_GITHUB_DOWNLOAD}/setup.cfg >${AIRFLOW_SOURCES}/setup.cfg
\
+ && curl -L ${AIRFLOW_GITHUB_DOWNLOAD}/airflow/version.py
>${AIRFLOW_SOURCES}/airflow/version.py \
+ && curl -L ${AIRFLOW_GITHUB_DOWNLOAD}/airflow/__init__.py
>${AIRFLOW_SOURCES}/airflow/__init__.py \
+ && curl -L ${AIRFLOW_GITHUB_DOWNLOAD}/airflow/bin/airflow
>${AIRFLOW_SOURCES}/airflow/bin/airflow
+
+# Airflow Extras installed
+ARG AIRFLOW_EXTRAS="all"
+ENV AIRFLOW_EXTRAS=${AIRFLOW_EXTRAS}
+RUN echo "Installing with extras: ${AIRFLOW_EXTRAS}."
+
+RUN pip install --no-use-pep517 -e ".[${AIRFLOW_EXTRAS}]"
+
+# Note! We are copying everything with airflow:airflow user:group even if we
use root to run the scripts
# This is fine as root user will be able to use those dirs anyway.
# Airflow sources change frequently but dependency configuration won't change
that often
# We copy setup.py and other files needed to perform setup of dependencies
-# This way cache here will only be invalidated if any of the
-# version/setup configuration change but not when airflow sources change
+# So in case setup.py changes we can install latest dependencies required.
COPY --chown=airflow:airflow setup.py ${AIRFLOW_SOURCES}/setup.py
COPY --chown=airflow:airflow setup.cfg ${AIRFLOW_SOURCES}/setup.cfg
COPY --chown=airflow:airflow airflow/version.py
${AIRFLOW_SOURCES}/airflow/version.py
COPY --chown=airflow:airflow airflow/__init__.py
${AIRFLOW_SOURCES}/airflow/__init__.py
COPY --chown=airflow:airflow airflow/bin/airflow
${AIRFLOW_SOURCES}/airflow/bin/airflow
-# Airflow Extras installed
-ARG AIRFLOW_EXTRAS="all"
-ENV AIRFLOW_EXTRAS=${AIRFLOW_EXTRAS}
-RUN echo "Installing with extras: ${AIRFLOW_EXTRAS}."
-
-# First install only dependencies but no Apache Airflow itself
-# This way regular changes in sources of Airflow will not trigger
reinstallation of all dependencies
-# And this Docker layer will be reused between builds.
+# The goal of this line is to install the dependencies from the most current
setup.py from sources
+# This will be usually incremental small set of packages so it will be very
fast
RUN pip install --no-use-pep517 -e ".[${AIRFLOW_EXTRAS}]"
COPY --chown=airflow:airflow airflow/www/package.json
${AIRFLOW_SOURCES}/airflow/www/package.json
COPY --chown=airflow:airflow airflow/www/package-lock.json
${AIRFLOW_SOURCES}/airflow/www/package-lock.json
WORKDIR ${AIRFLOW_SOURCES}/airflow/www
+ARG BUILD_NPM=true
+ENV BUILD_NPM=${BUILD_NPM}
+
# Install necessary NPM dependencies (triggered by changes in
package-lock.json)
-RUN gosu ${AIRFLOW_USER} npm ci
+RUN \
+ if [[ "${BUILD_NPM}" == "true" ]]; then \
+ gosu ${AIRFLOW_USER} npm ci; \
+ fi
COPY --chown=airflow:airflow airflow/www/ ${AIRFLOW_SOURCES}/airflow/www/
# Package NPM for production
-RUN gosu ${AIRFLOW_USER} npm run prod
+RUN \
+ if [[ "${BUILD_NPM}" == "true" ]]; then \
+ gosu ${AIRFLOW_USER} npm run prod; \
+ fi
Review comment:
Probably a further enhancment (i.e. not in this PR) but is it worth making a
separate npm-only image and doing the `npm run prod` in there.
Maybe this is where the flavours of image come in - for ci/dev having Npm in
the same image makes sense, but for prod it would be nice to not have to
install node.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services