potiuk commented on a change in pull request #10784:
URL: https://github.com/apache/airflow/pull/10784#discussion_r490609617



##########
File path: Dockerfile.ci
##########
@@ -256,7 +256,7 @@ ENV 
UPGRADE_TO_LATEST_CONSTRAINTS=${UPGRADE_TO_LATEST_CONSTRAINTS}
 # But in cron job we will install latest versions matching setup.py to see if 
there is no breaking change
 # and push the constraints if everything is successful
 RUN \
-    if [[ "${UPGRADE_TO_LATEST_CONSTRAINTS}" == "true" ]]; then \
+    if [[ "${UPGRADE_TO_LATEST_CONSTRAINTS}" != "false" ]]; then \

Review comment:
       Everything will work, I believe (but it would be great to check if there 
are no loopholes).
   
   This part has not changed and we run it for > year now. Only the location 
changed to run the scripts  -  the scripts are run in  the "workflow_run". Here 
we build the image using `scripts/ci` taken from master, but the `Dockerfile`, 
`setup.py`, `scripts/in_container` and everything else is coming from the PR.
   
   In Dockerfile.ci there are two steps of pip  install:
   
   1)  The first one: 
https://github.com/apache/airflow/blob/5b3fb53d9c5c943abb8ddc90214c320b7a11c8c2/Dockerfile.ci#L223
 installs deps from master. This one only invalidates when we rebuild the image 
from scratch (i.e. when we change Dockerfile or when we get a new base python 
image. In all other cases, this one is taken from cache - even if setup.py 
changes. This is because we COPY setup.py in one of the following lines (not 
before). It's really an optimization - thanks to that, even if setup.py 
changes, it does not take 20 minutes to build the image but maybe 6-7 minutes 
(because all dependencies from the previous master run are already installed in 
the layer taken from the cache. 
   
   2) then we copy setup.py: 
https://github.com/apache/airflow/blob/master/Dockerfile.ci#L243 - if setup.py 
is not changed vs. latest master build - this layer is also taken from cache as 
well as the subsequent step of pip install (this way if no setup.py changes the 
image build is ~ 3/4 minutes - we do not lose time on even checking for new 
requirements. But if the setup.py changes (setup.py comes from the PR - just to 
remind) then Docker cache gets invalidated here and all later layers of the 
image are built from here.
   
   3) Then if setup.py changed in the previous step we reach that line during 
rebuilding the layers: 
https://github.com/apache/airflow/blob/master/Dockerfile.ci#L258. 
   In "normal" PR's (UPGRADE_TO_LATEST_CONSTRAINT = "false") we run this 
command:
    `pip install -e ".[${AIRFLOW_EXTRAS}]"   --constraint 
"${AIRFLOW_CONSTRAINTS_URL}" `
   
   This will:
   a) install all the new stuff that appeared in setup.py (so here where we 
install new packages)
   b) upgrade all the stuff for which limits were changed in setup.py and the 
limits do not fit either constraints or already installed versions
   c) but all the rest remains untouched.
   
   This way PR gets only the previously "good" constrains + anything that 
changed in the setuup.py for this PR (this is where we avoid the 'package not 
found`).
   
   Basically it means that it will only try to upgrade or install new deps that 
freshly appeared in setup.py. But if the previous requirements are fitting the 
setup.py limits, no "eager" upgrade of those requirements is performed.
   
   In the other case (UPGRADE_TO_LATEST_CONSTRAINT = "commit hash") the same 
pip install is run with `--upgrade --upgrade-strategy eager and NO 
constraints'. This is only run in case we run a "master merge" push or 
scheduled build. This  means that pip will try to upgrade all the dependencies 
to their most recent versions that fulfill the setup.py limitations  - without 
looking at the last constraints. If such a build succeeds (all tests pass), 
then we upgrade constraints to match those newly upgraded versions (because we 
know that they passed the tests). This is recorded as the commit message in teh 
orphaned "constraints-master" or constraints-1-10 branches (depending if it was 
a master or v1-10-test push):
   
   Example here: 
https://github.com/apache/airflow/commit/625194596587bd8b3ef54f1902b6060e4a0c187b
 - you see which commit originated it and what exactly changes so we can easily 
track the history of upgrades of those.
   
   Once the image is build (with potentially new requirements), it is pushed to 
registry and tests for the run that triggered the build are run using this 
image.
   
   
   Why check-ing for "commit hash" - because if it was "true" then this line 
woudl only be invalidated if anything changed in the "setup.py" above.. By 
adding commit hash here, we make sure to invalidate this line with every merge 
push, and it means that we will be running "--upgrade --upgrade-strategy" every 
time we merge to master. This way we will have bigger number of smaller 
incremental changes (with newly released versions of dependent libraries" every 
time we merge.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to