potiuk commented on code in PR #35150:
URL: https://github.com/apache/airflow/pull/35150#discussion_r1369988440


##########
Dockerfile:
##########
@@ -48,7 +48,7 @@ ARG AIRFLOW_VERSION="2.7.2"
 
 ARG PYTHON_BASE_IMAGE="python:3.8-slim-bullseye"
 
-ARG AIRFLOW_PIP_VERSION=23.3
+ARG AIRFLOW_PIP_VERSION=23.3.1

Review Comment:
   We do not realy want that. Pinning `pip` to specific version and manually 
upgrading it when new version gets released via PR is a deliberate decision.
   
   Here is the reasoning:
   
   Bumping `pip` verssion is a rare event (usually a bit faster for .1 version 
to accomodate for teething problems but then it slows down), and `pip` does not 
follow Semver but some modified Calver (23 = year , 3 = quarter, 1 = number of 
release - patchlevel- in the quarter). 
   
   Even patchlevel releases in the past introduced breaking changes that broke 
our builds, and those releases are relatively infrequent, and I prefer to keep 
the "exact" version of `pip` in our image due to that.  While technically 
https://pip.pypa.io/en/stable/development/release-process/ the `patchlevel` 
releases should just be `bugfixes` it happened in the past that they were 
breaking for us (and others) - this was also sometimes connected with other 
dependencies used in the Python build toolchain (think setuptools, Cython etc.).
   
   This is pretty special case as `pip` release has the potential of totally 
breaking our CI / builds - the algorithm for dependency resolution that `pip` 
has is big part of our `automated constraints update` toolchain, also `pip` 
maintainers are (rightfully I think) more concerned about following new PEPs 
than breaking compatibility and there were quite a number of breaking changes 
resulitng from implementing new PEPs for example -  so we should adjust and 
make the upgrade to even "patchlevel" as a manual effort. Usually there are max 
2 upgrades for the `patchlevel` version.
   
   Also one more reason - keeping it "fixed" produces much more (from Python 
point of view) reproducible Docker images. While our images are not fully 
reproducible when built (for example they can get newer Python patchlevel or 
newer OS dependencies, haivng fixed PIP and constraints, makes the rebuild 
fairly consistent when it comes to `python` depenedncies. This will become 
really crucial when we complete our SBOM work (we work on it as part of 
Security improvements with @hussein-awala @eladkal and @pierrejeambrun) - and 
once we fix `pip` - the SBOMs for Python side of released Airflow version are 
very strictly "pinned" - i.e. for each version of Airflow you will be able to 
deterministically say "This is the version of dependencies that x.y.z version 
of Airflow has (same for providers)" that Airflow has been released with" - and 
this will be the base for anyone looking at SBOM information to determine if 
they should upgrade anything or not due to security issues. That's why it's pre
 tty important to keep all Python dependendencies (including `pip`) fixed per 
version of Airlfow. 
   
   We are doing it via constraints in Airflow, but `pip` is used to manage the 
constrainst so this is a bit chicken-egg problem. Strict-pinning `pip` in 
images helps to achieve really `reproducible` state of dependencies.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to