potiuk commented on a change in pull request #11526:
URL: https://github.com/apache/airflow/pull/11526#discussion_r505563496



##########
File path: Dockerfile
##########
@@ -206,6 +206,9 @@ ENV AIRFLOW_LOCAL_PIP_WHEELS=${AIRFLOW_LOCAL_PIP_WHEELS}
 ARG SLUGIFY_USES_TEXT_UNIDECODE=""
 ENV SLUGIFY_USES_TEXT_UNIDECODE=${SLUGIFY_USES_TEXT_UNIDECODE}
 
+ARG INSTALL_ROVIDERS_FROM_SOURCES="true"

Review comment:
       This is exactly what I tried to start a discussion about here. Happy to 
hear your opinions on that.
   
   For now, I deliberately chose it to default to "true" by design. But I am 
open to discuss it and happy to hear your opinion/arguments why it should not 
be if you think otherwise.
   
   By default, when you are using production Dockerfile, we are installing 
Airflow from the sources. Actually, you even can't use the Dockerfile outside 
of sources because (as it is with most of the Dockerfiles) you need scripts, 
some extra files etc., and generally, the docker context where some relevant 
scripts/potentially binaries are stored. I think this is something that people 
will do rarely (we did it for example when developing the POC with the customer 
recently but then we were comfortable with pulling the latest version of 
airflow) - and eventually, we switched to installing airflow from binary pip 
packages in the image.
   
   So the default behavior for production Dockerfile when you run 'docker build 
.' is that it builds airflow from the sources. This is also consistent with 
what we are releasing as .src.tgz - you only need that source package (and all 
the apt/pip) to rebuild a fully functional Apache Airflow image in the very 
exact version the sources were released with. 
   
   So part of my proposal here - for anyone building the Docker Image from the 
sources(- is that they will still get all set of providers. This is backward 
compatible, convenient, and generally should only be used by developers.
   
   There is the second build mode where you use PIP packages (--build-arg 
INSTALL_AIRFLOW_VERSION=1.10.12) for example. And in this mode Airflow is 
installed from PyPI with the version specified. You still need Airflow sources 
for that (because you need the scripts) but I am thinking about a way to not 
require sources to build a custom image.  Happy to hear some suggestions here 
as well. 
   
   Another way of building the image is that you can use .whl packages placed 
in `docker-context-files` or even directly a version tagged in GitHub 
repository of Airflow (--build-arg INSTALL_AIRLFOW_REFERENCE=master). 
   
   In all those cases besides default building from sources, the 
INSTALL_PROVIDER_FROM_SOURCES has no effect and you only install the core 
package. Unless you specify [extra] (when this PR gets merged). In such case, 
those extra packages will be installed from PyPI or from other sources that you 
configure in .piprc - also configurable now. For example, you could configure 
.piprc to point to `/docker-context-files` as "source" of packages and place 
all the .whl provider packages there and then installing airflow[google] will 
get the packages from those wheels placed in the `docker-context-files` folder 
rather than PyPI. We have few flags that together eliminate entirely the needs 
of reaching out to any external server (besides APT mirrors) during the build. 
There are some details and examples in `docs/production-deployments.rst`.
   
   This is all highly optimized in the current way image is build, because 
those `docker-context-files` are only copied to the "build" segment of the 
image and never land in the final image.  This is also a very controlled, 
highly secure, perfectly sound and maintainable way of building Airflow image. 
It  is forward-compatible with the future version of Dockerfiles we might have) 
from vetted "local" binary sources - it does not require the customers to 
modify the Dockerfile and keep their own copy. This BTW is something that at 
least two of our huge customers either do in production now or will do very 
soon - switch to use this Dockerfile and use it rather than provide their own. 
So this is a highly requested feature from big customers - their security teams 
expect a buildable Dockerfile and they want to make sure it will work in a 
future version of Airflo without some modifications that have to be 
cherry-picked in new versions.
   
   As soon as we reach 2.0 /1.10.13 and I apply for the "official" status, I 
will make sure any future changes will be backward-compatible BTW. 
   
   So for me, this is by design to have `INSTALL_PROVIDERS_FROM_SOURCES` "true" 
by default.  But If you have some scenarios/concerns about it, I am happy to 
hear them.
   
    
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to