mik-laj commented on a change in pull request #14911: URL: https://github.com/apache/airflow/pull/14911#discussion_r599151779
########## File path: docs/docker-stack/build.rst ########## @@ -262,119 +413,99 @@ The ``pip download`` might happen in a separate environment. The files can be co binary repository and vetted/verified by the security team and used subsequently to build images of Airflow when needed on an air-gaped system. -Preparing the constraint files and wheel files: +Example of preparing the constraint files and wheel files (note that ``mysql`` dependency is removed +as ``mysqlclient`` is installed from Oracle's ``apt`` repository and if you want to add it, you need +to provide this library from you repository if you want to build Airflow image in an "air-gaped" system. -.. code-block:: bash +.. exampleinclude:: docker-examples/restricted/restricted_environments.sh + :language: bash + :start-after: [START download] + :end-before: [END download] - rm docker-context-files/*.whl docker-context-files/*.txt +After this step is finished, your ``docker-context-files`` folder will contain all the packages that +are needed to install Airflow from. - curl -Lo "docker-context-files/constraints-2-0.txt" \ - https://raw.githubusercontent.com/apache/airflow/constraints-2-0/constraints-3.7.txt +Those downloaded packages and constraint file can be pre-vetted by your security team before you attempt +to install the image. You can also store those downloaded binary packages in your private artifact registry +which allows for the flow where you will download the packages on one machine, submit only new packages for +security vetting and only use the new packages when they were vetted. - pip download --dest docker-context-files \ - --constraint docker-context-files/constraints-2-0.txt \ - apache-airflow[async,aws,azure,celery,dask,elasticsearch,gcp,kubernetes,mysql,postgres,redis,slack,ssh,statsd,virtualenv]==2.0.1 +On a separate (air-gaped) system, all the PyPI packages can be copied to ``docker-context-files`` +where you can build the image using the packages downloaded by passing those build args: -Since apache-airflow .whl packages are treated differently by the docker image, you need to rename the -downloaded apache-airflow* files, for example: + * ``INSTALL_FROM_DOCKER_CONTEXT_FILES="true"`` - to use packages present in ``docker-context-files`` + * ``AIRFLOW_PRE_CACHED_PIP_PACKAGES="false"`` - to not pre-cache packages from PyPI when building image + * ``AIRFLOW_CONSTRAINTS_LOCATION=/docker-context-files/YOUR_CONSTRAINT_FILE.txt`` - to downloaded constraint files + * (Optional) ``INSTALL_MYSQL_CLIENT="false"`` if you do not want to install ``MySQL`` + client from the Oracle repositories. In this case also make sure that your -.. code-block:: bash +Note, that the solution we have for installing python packages from local packages, only solves the problem +of "air-gaped" python installation. The Docker image also downloads ``apt`` dependencies and ``node-modules``. +Those type of dependencies are however more likely to be available in your "air-gaped" system via transparent +proxies and it should automatically reach out to your private registries, however in the future the +solution might be applied to both of those installation steps. - pushd docker-context-files - for file in apache?airflow* - do - mv ${file} _${file} - done - popd +You can also use techniques described in the previous chapter to make ``docker build`` use your private +apt sources or private PyPI repositories (via ``.pypirc``) available which can be security-vetted. -Building the image: +If you fulfill all the criteria, you can build the image on an air-gaped system by running command similar +to the below: -.. code-block:: bash +.. exampleinclude:: docker-examples/restricted/restricted_environments.sh + :language: bash + :start-after: [START build] + :end-before: [END build] - ./breeze build-image \ - --production-image --python 3.7 --install-airflow-version=2.0.1 \ - --disable-mysql-client-installation --disable-pip-cache --install-from-local-files-when-building \ - --constraints-location="/docker-context-files/constraints-2-0.txt" +Modifying the Dockerfile +........................ -or +The build arg approach is a convenience method if you do not want to manually modify the ``Dockerfile``. +Our approach is flexible enough, to be able to accommodate most requirements and +customizations out-of-the-box. When you use it, you do not need to worry about adapting the image every +time new version of Airflow is released. However sometimes it is not enough if you have very +specific needs and want to build a very custom image. In such case you can simply modify the +``Dockerfile`` manually as you see fit and store it in your forked repository. However you will have to +make sure to rebase your changes whenever new version of Airflow is released, because we might modify +the approach of our Dockerfile builds in the future and you might need to resolve conflicts +and rebase your changes. -.. code-block:: bash +There are a few things to remember when you modify the ``Dockerfile``: - docker build . \ - --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \ - --build-arg PYTHON_MAJOR_MINOR_VERSION=3.7 \ - --build-arg AIRFLOW_INSTALLATION_METHOD="apache-airflow" \ - --build-arg AIRFLOW_VERSION="2.0.1" \ - --build-arg AIRFLOW_VERSION_SPECIFICATION="==2.0.1" \ - --build-arg AIRFLOW_CONSTRAINTS_REFERENCE="constraints-2-0" \ - --build-arg AIRFLOW_SOURCES_FROM="empty" \ - --build-arg AIRFLOW_SOURCES_TO="/empty" \ - --build-arg INSTALL_MYSQL_CLIENT="false" \ - --build-arg AIRFLOW_PRE_CACHED_PIP_PACKAGES="false" \ - --build-arg INSTALL_FROM_DOCKER_CONTEXT_FILES="true" \ - --build-arg AIRFLOW_CONSTRAINTS_LOCATION="/docker-context-files/constraints-2-0.txt" +* We are using the widely recommended pattern of ``.dockerignore`` where everything is ignored by default + and only the required folders are added through exclusion (!). This allows to keep docker context small + because there are many binary artifacts generated in the sources of Airflow and if they are added to + the context, the time of building the image would increase significantly. If you want to add any new + folders to be available in the image you must add it here with leading ``!``. +.. code-block:: text -Customizing & extending the image together -.......................................... + # Ignore everything + ** -You can combine both - customizing & extending the image. You can build the image first using -``customize`` method (either with docker command or with ``breeze`` and then you can ``extend`` -the resulting image using ``FROM`` any dependencies you want. + # Allow only these directories + !airflow + ... -Customizing PYPI installation -............................. -You can customize PYPI sources used during image build by adding a ``docker-context-files``/``.pypirc`` file -This ``.pypirc`` will never be committed to the repository and will not be present in the final production image. -It is added and used only in the build segment of the image so it is never copied to the final image. +* The ``docker-context-files`` folder is automatically added to the context of the image, so if you want + to add individual files, binaries, requirement files etc you can add them there. The + ``docker-context-files`` is copied to the ``/docker-context-files`` folder of the build segment of the + image, so it is not present in the final image - which makes the final image smaller in case you want + to use those files only in the ``build`` segment. You must copy any files from the directory manually, + using COPY command if you want to get the files in your final image (in the main image segment). Review comment: ```suggestion using ``COPY`` command if you want to get the files in your final image (in the main image segment). ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
