potiuk commented on a change in pull request #4938: [AIRFLOW-4117] 
Multi-staging Image - Travis CI tests [Step 3/3]
URL: https://github.com/apache/airflow/pull/4938#discussion_r298820154
 
 

 ##########
 File path: Dockerfile
 ##########
 @@ -278,42 +278,75 @@ RUN echo "Pip version: ${PIP_VERSION}"
 
 RUN pip install --upgrade pip==${PIP_VERSION}
 
-# We are copying everything with airflow:airflow user:group even if we use 
root to run the scripts
+ARG AIRFLOW_REPO=apache/airflow
+ENV AIRFLOW_REPO=${AIRFLOW_REPO}
+
+ARG AIRFLOW_BRANCH=master
+ENV AIRFLOW_BRANCH=${AIRFLOW_BRANCH}
+
+ENV 
AIRFLOW_GITHUB_DOWNLOAD=https://raw.githubusercontent.com/${AIRFLOW_REPO}/${AIRFLOW_BRANCH}
+
+# We perform fresh dependency install at the beginning of each month from the 
scratch
+# This way every month we re-test if fresh installation from the scratch 
actually works
+# As opposed to incremental installations which does not upgrade already 
installed packages unless it
+# is required by setup.py constraints.
+ARG BUILD_MONTH
+
+# We get Airflow dependencies (no Airflow sources) from the master version of 
Airflow in order to avoid full
+# pip install layer cache invalidation when setup.py changes. This can be 
reinstalled from the
+# latest master by increasing PIP_DEPENDENCIES_EPOCH_NUMBER.
+RUN mkdir -pv ${AIRFLOW_SOURCES}/airflow/bin \
+ && curl -L ${AIRFLOW_GITHUB_DOWNLOAD}/setup.py >${AIRFLOW_SOURCES}/setup.py \
+ && curl -L ${AIRFLOW_GITHUB_DOWNLOAD}/setup.cfg >${AIRFLOW_SOURCES}/setup.cfg 
\
 
 Review comment:
   You only need to build the docker image once and then this layer will be 
cached by docker. as long as BUILD_MONTH remains the same Docker will not 
invalidate this layer (It will not check if the master have changed or anything 
like that).
   If you simply run 'docker build .' then BUILD_MONTH will be always empty and 
the build will remain in the cache and never try to download stuff from github.
   
   If you use './hooks/build' , the BUILD_MONTH arg will change every month and 
every month it will re-download setup.py/setup/cfg etc. from master and 
reinstall all dependencies from the scratch. For that you will need to have 
access to internet (basically you need it for pip install anyway) - but it will 
only trigger when you actually run the build.
   
   
   In most cases - if you make a change to setup.py you will have to run the 
build and then it will use cache until the COPY -- line below - where it will 
see that setup.py changed and it will run 'pip install' right after that. But 
then - you have to have access to internet to run pip install anyway. And again 
- it will only be needed if you run docker build which should not happen often 
(and you do not need it except if you rebase with other changes or change 
setup.py). Sources from your host will be mounted inside docker image so there 
is no need to rebuild it for local development at all.
   
   However indeed decision on when to run docker build is not obvious and even 
checking if cache layer is invalidated can take some time so  I actually 
address it in the   #4932 which I will shortly open for proposal/voting. This 
is 'breeze' development environment based on this new image. There I 
implemented additional checking of md5sum on crucial files (like setup.py or 
Dockerfile) and when entering the environment with `breeze` command I will warn 
the user that the local environment needs updating (and the user will have 
option to rebuild or continue without it) - this way - once you run `breeze` at 
least once, you can continue using it locally without any internet access and 
you will get warned when you need to update.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to