potiuk commented on pull request #19189:
URL: https://github.com/apache/airflow/pull/19189#issuecomment-950887593


   > I've be hesitant to propose this since this is technically a backward 
incompatibility change for those using PROD as a base image for their 
Dockerfile, the most significant part being (obviously) the location of the 
interpreter. So while I think this is a good thing to do in a vacuum, this 
should probably either be done by introducing a new image tag series and 
deprecating the venv-less one until 3.0.
   
   First of all I do not thing this is backwards-incompatible, secondly - I do 
not really think this is a problem even if it was becuase airflow 
incompatibility has nothing to do with image incompatibility (especially that 
our image is not yet "official stable" image - it's a "reference" image).
   
   Why I think it is not incompatible? 
   
   Because all the examples and recommendation we had about extending and 
customising the image, remain unchanged. The image, airlfow, providers and all 
the tools inside will continue to work if peopel were using all our examples 
and following them (and we have PLENTY of them). Even more - those examples are 
automatically validated during the CI build (except image customisation that I 
run separately every time I make significant change like this one - so I am 
pretty sure they are working fine. All our prod 'image tests" are also working 
fine with it (we test if all the imports work, if all providers are installed 
and are importable, etc. etc. From the user's point of view - who either 
customizes or extends the image - nothing  changes. The only change is where 
the packages are installed. But if they use (as they should) `pip` to 
manipulate their packages, nothing changes.
   
   Even if they manualy added `--user` flag in their PIP, this will continue to 
work (except some really obscure changes) - althought they were not even 
encouraged to do that - we had PIP_USER variable set in the image which made 
this behaviour automatic (and this variable is gone with that change).
   
   This is really equivalent to refactoring code wihch is not "public" API in 
Python. The "location" of the instaled packages is not "public API". The 'pip' 
commands to manipulate those are the API (and those have not changed).
   
   Now why this would not be a big problem even if it was more 
"backwards-incompatible"? 
   
   The `Airflow X.Y` compatibility is all about "Airflow", not about the image. 
There is no "guarantee" that the image will remain unchanged - in fact we have 
done quite a number of incompatible variable names when customizing the image 
in the past without any major disruptions to our users. The Image we publish is 
not "official" release - it is a "convenience binary" and I often even name it 
"reference image". It does not bring the same "guarantees" as official release, 
details of it can change without breaking Airflow MAJOR version compatibility. 
I try - of course - not to do it and I think we had far more of those changes 
between 2.0.0 and 2.1.0 - where we got a lot of feedback from the users (for 
example OpenShift compatibility came from that) and were able to incorpoarate a 
lot of that without waiting for Airflow 3. That's a major win for the quality 
of the image I think. Even Python base images did some backwards-incompatible 
changes in the past. For example by replacing the 3.
 * images suddenly with removal of Python 2.7 (!) without even bumping 
patchlevel (!). That's not a "nice" approach of course - but technically 
speaking it did not break Python3.* compatibility (otherwise they would have to 
wait with releasing the images without Python 2.7 until version 4).
   
   This situation will change however (from my point of view at least - 
apparently Python maintainers have a different view on that) when we apply for 
the "official docker image status" - 
https://docs.docker.com/docker-hub/official_images/. Then I would be far more 
careful about similar changes. This is about the last two changes I am still 
hesitant about completing because there were a few open things (like the 
.venv). When I look at the rate of changes of the image it stabilized 
significantly. We handle all the cases we want to handle, the API to build 
those images was significantly simplified and more intuitive, we had far more 
issues raised by the users that my answer is "Yes - this is supported already 
by the image see the doc here" (for example when people want to build image in 
air-gaped environment or when they want to verify provenence of all the python 
packages, or when they want to add custom entrypoint etc. etc. ). 
   
   I was building up the knowledge and documentation and I think I am rather 
close to say "yeah we are ready to get the official image status". By then as 
well I plan to extract a separate "read-only" repot where only relevant files 
will be present (I plan to use `copybara` to only copy relevant commits/code 
from Airflow repo) and then it will be much easier from people to "officially" 
build their custom images and it will be actually built automatically by 
Docker's official team (plus we will get extra security checks and 
notifications as the "official" images by Docker get special treatment and got 
some automated scanning and notifications - and then we will likely also have 
to build a bit faster loop on rebuilding the images when security issues are 
discovered in base image (but that's another topic to be discussed when we 
apply for the "official" status). Then such images will be available to pull as 
`docker pull apache-airflow` and then yeah - I agree such change could be seen a
 s backwards-incompatibile.
   
   See the issues there: https://github.com/apache/airflow/projects/3  - not 
having "official" status is the only reason why AIP-26 is still "in-progress".


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to