potiuk commented on a change in pull request #4938: [AIRFLOW-4117] 
Multi-staging Image - Travis CI tests [Step 3/3]
URL: https://github.com/apache/airflow/pull/4938#discussion_r299357268
 
 

 ##########
 File path: Dockerfile
 ##########
 @@ -278,42 +278,75 @@ RUN echo "Pip version: ${PIP_VERSION}"
 
 RUN pip install --upgrade pip==${PIP_VERSION}
 
-# We are copying everything with airflow:airflow user:group even if we use 
root to run the scripts
+ARG AIRFLOW_REPO=apache/airflow
+ENV AIRFLOW_REPO=${AIRFLOW_REPO}
+
+ARG AIRFLOW_BRANCH=master
+ENV AIRFLOW_BRANCH=${AIRFLOW_BRANCH}
+
+ENV 
AIRFLOW_GITHUB_DOWNLOAD=https://raw.githubusercontent.com/${AIRFLOW_REPO}/${AIRFLOW_BRANCH}
+
+# We perform fresh dependency install at the beginning of each month from the 
scratch
+# This way every month we re-test if fresh installation from the scratch 
actually works
+# As opposed to incremental installations which does not upgrade already 
installed packages unless it
+# is required by setup.py constraints.
+ARG BUILD_MONTH
+
+# We get Airflow dependencies (no Airflow sources) from the master version of 
Airflow in order to avoid full
+# pip install layer cache invalidation when setup.py changes. This can be 
reinstalled from the
+# latest master by increasing PIP_DEPENDENCIES_EPOCH_NUMBER.
+RUN mkdir -pv ${AIRFLOW_SOURCES}/airflow/bin \
+ && curl -L ${AIRFLOW_GITHUB_DOWNLOAD}/setup.py >${AIRFLOW_SOURCES}/setup.py \
+ && curl -L ${AIRFLOW_GITHUB_DOWNLOAD}/setup.cfg >${AIRFLOW_SOURCES}/setup.cfg 
\
 
 Review comment:
   It's really an optimisation only. If I copy the setup.py from local context, 
it will invalidate all the subsequent layers when the setup.py changes - which 
means that on setup.py change we always reinstall everything from scratch. 
   
   This is both good and bad. Good that pip install is from the scratch. Bad - 
that it takes a lot of time every time you add/modify even a single dependency. 
   
   For sure when we make a production image we should skip that step and always 
reinstall from the scratch. But for CI images and also following steps like 
breeze or pre-commit hooks, I think it makes perfect sense to optimise for 
speed of building. 
   
   In the pattern I propose - we already have all "fresh" dependencies 
installed from master - and only as next step we upgrade/downgrade as needed 
according to the actual "setup.py" we have in sources. This is so much faster 
than installing from the scratch - build time in this case goes down from like 
4 minutes to less than 1. This adds up in CI builds. If you have setup.py 
change - until the change is merged into master (and DockerHub build 
completes), it will re-run the pip install with every PR rebase.
   
   Also the once-a-month full reinstall (which is now automated) is also good - 
it will keep us from accumulating incremental changes.
   
   I think this optimisation is a bit novel/unusual indeed but so far I found 
no problems with that - you anyhow need access to PIP to install it.
   
   What I can do as well, Is I can make failures in this step (and the 
following pip-install) non-terminal. This way even if - for example - github is 
not accessible, it will simply silently fail. This is really an idempotent step 
- so the next step's 'pip install' will work fine regardless if this 
"optimisation" one succeeds.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to