mik-laj commented on a change in pull request #14911: URL: https://github.com/apache/airflow/pull/14911#discussion_r599148180
########## File path: docs/docker-stack/build.rst ########## @@ -41,210 +80,322 @@ Debian dependencies with ``apt`` or PyPI dependencies with ``pip install`` or an You should be aware, about a few things: * The production image of airflow uses "airflow" user, so if you want to add some of the tools - as ``root`` user, you need to switch to it with ``USER`` directive of the Dockerfile. Also you - should remember about following the + as ``root`` user, you need to switch to it with ``USER`` directive of the Dockerfile and switch back to + ``airflow`` user when you are done. Also you should remember about following the `best practises of Dockerfiles <https://docs.docker.com/develop/develop-images/dockerfile_best-practices/>`_ to make sure your image is lean and small. - .. code-block:: dockerfile +* The PyPI dependencies in Apache Airflow are installed in the user library, of the "airflow" user, so + PIP packages are installed to ~/.local folder as if the ``--user`` flag was specified when running PIP. + Note also that using ``--no-cache-dir`` is a good idea that can help to make your image smaller. - FROM apache/airflow:2.0.1 - USER root - RUN apt-get update \ - && apt-get install -y --no-install-recommends \ - my-awesome-apt-dependency-to-add \ - && apt-get autoremove -yqq --purge \ - && apt-get clean \ - && rm -rf /var/lib/apt/lists/* - USER airflow +* If your apt, or PyPI dependencies require some of the ``build-essential`` or other packages that need + to compile your python dependencies, then your best choice is to follow the "Customize the image" route, + because you can build a highly-optimized (for size) image this way. However it requires to checkout sources + of Apache Airflow, so you might still want to choose to add ``build-essential`` to your image, + even if your image will be significantly bigger. +* You can also embed your dags in the image by simply adding them with COPY directive of Airflow. + The DAGs in production image are in ``/opt/airflow/dags`` folder. -* PyPI dependencies in Apache Airflow are installed in the user library, of the "airflow" user, so - you need to install them with the ``--user`` flag and WITHOUT switching to airflow user. Note also - that using --no-cache-dir is a good idea that can help to make your image smaller. +* You can build your image without any need for Airflow sources. It is enough that you place the + ``Dockerfile`` and any files that are referred to (such as Dag files) in a separate directory and run + a command ``docker build . --tag my-image:my-tag`` (where ``my-image`` is the name you want to name it + and ``my-tag`` is the tag you want to tag the image with. - .. code-block:: dockerfile +.. note:: + As of 2.0.1 image the ``--user`` flag is turned on by default by setting ``PIP_USER`` environment variable + to ``true``. This can be disabled by un-setting the variable or by setting it to ``false``. In the + 2.0.0 image you had to add the ``--user`` flag as ``pip install --user`` command. - FROM apache/airflow:2.0.1 - RUN pip install --no-cache-dir --user my-awesome-pip-dependency-to-add +Examples of image extending +--------------------------- -* As of 2.0.1 image the ``--user`` flag is turned on by default by setting ``PIP_USER`` environment variable - to ``true``. This can be disabled by un-setting the variable or by setting it to ``false``. +An ``apt`` package example +.......................... +The following example adds ``vim`` to the airflow image. -* If your apt, or PyPI dependencies require some of the build-essentials, then your best choice is - to follow the "Customize the image" route. However it requires to checkout sources of Apache Airflow, - so you might still want to choose to add build essentials to your image, even if your image will - be significantly bigger. +.. exampleinclude:: docker-examples/extending/add-apt-packages/Dockerfile + :language: Dockerfile + :start-after: [START Dockerfile] + :end-before: [END Dockerfile] - .. code-block:: dockerfile +A ``PyPI`` package example +.......................... - FROM apache/airflow:2.0.1 - USER root - RUN apt-get update \ - && apt-get install -y --no-install-recommends \ - build-essential my-awesome-apt-dependency-to-add \ - && apt-get autoremove -yqq --purge \ - && apt-get clean \ - && rm -rf /var/lib/apt/lists/* - USER airflow - RUN pip install --no-cache-dir --user my-awesome-pip-dependency-to-add +The following example adds ``lxml`` python package from PyPI to the image. -* You can also embed your dags in the image by simply adding them with COPY directive of Airflow. - The DAGs in production image are in ``/opt/airflow/dags`` folder. +.. exampleinclude:: docker-examples/extending/add-pypi-packages/Dockerfile + :language: Dockerfile + :start-after: [START Dockerfile] + :end-before: [END Dockerfile] + +A ``build-essential`` requiring package example +............................................... + +The following example adds ``mpi4py`` package which requires both ``build-essential`` and ``mpi compiler``. + +.. exampleinclude:: docker-examples/extending/add-build-essential-extend/Dockerfile + :language: Dockerfile + :start-after: [START Dockerfile] + :end-before: [END Dockerfile] + +The size of this image is ~ 1.1 GB when build. As you will see further, you can achieve 20% reduction in +size of the image in case you use "Customizing" rather than "Extending" the image. + +DAG embedding example +..................... + +The following example adds ``test_dag.py`` to your image in the ``/opt/airflow/dags`` folder. + +.. exampleinclude:: docker-examples/extending/embedding-dags/Dockerfile + :language: Dockerfile + :start-after: [START Dockerfile] + :end-before: [END Dockerfile] + + +.. exampleinclude:: docker-examples/extending/embedding-dags/test_dag.py + :language: Python + :start-after: [START dag] + :end-before: [END dag] Customizing the image --------------------- -Customizing the image is an alternative way of adding your own dependencies to the image - better -suited to prepare optimized production images. +Customizing the image is an optimized way of adding your own dependencies to the image - better +suited to prepare highly optimized (for size) production images, especially when you have dependencies +that require to be compiled before installing (such as ``mpi4py``). + +It also allows more sophisticated usages, needed by "Power-users" - for example using forked version +of Airflow, or building the images from security-vetted sources. + +The big advantage of this method is that it produces optimized image even if you need some compile-time +dependencies that are not needed in the final image. + +The disadvantage is that you need to use Airflow Sources to build such images from the +`official distribution repository of Apache Airflow <https://downloads.apache.org/airflow/>`_ for the +released versions, or from the checked out sources (using release tags or main branches) in the +`Airflow GitHub Project <https://github.com/apache/airflow>`_ or from your own fork +if you happen to do maintain your own fork of Airflow. + +Another disadvantage is that the pattern of building Docker images with ``--build-arg`` is less familiar +to developers of such images. However it is quite well-known to "power-users". That's why the +customizing flow is better suited for those users who have more familiarity and have more custom +requirements. + +The image also usually builds much longer than the equivalent "Extended" image because instead of +extending the layers that are already coming from the base image, it rebuilds the layers needed +to add extra dependencies needed at early stages of image building. + +When customizing the image you can choose a number of options how you install Airflow: + + * From the PyPI releases (default) + * From the custom installation sources - using additional/replacing the original apt or PyPI repositories + * From local sources. This is used mostly during development. + * From tag or branch, or specific commit from a GitHub Airflow repository (or fork). This is particularly + useful when you build image for a custom version of Airflow that you keep in your fork and you do not + want to release the custom Airflow version to PyPI. + * From locally stored binary packages for Airflow, Airflow Providers and other dependencies. This is + particularly useful if you want to build Airflow in a highly-secure environment where all such packages + must be vetted by your security team and stored in your private artifact registry. This also + allows to build airflow image in an air-gaped environment. + * Side note. Building ``Airflow`` in an ``air-gaped`` environment sounds pretty funny, doesn't it? + +You can also add a range of customizations while building the image: + + * base python image you use for Airflow + * version of Airflow to install + * extras to install for Airflow (or even removing some default extras) + * additional apt/python dependencies to use while building Airflow (DEV dependencies) + * additional apt/python dependencies to install for runtime version of Airflow (RUNTIME dependencies) + * additional commands and variables to set if needed during building or preparing Airflow runtime + * choosing constraint file to use when installing Airflow + +Additional explanation is needed for the last point. Airflow uses constraints to make sure +that it can be predictably installed, even if some new versions of Airflow dependencies are +released (or even dependencies of our dependencies!). The docker image and accompanying scripts +usually determine automatically the right versions of constraints to be used based on the Airflow +version installed and Python version. For example 2.0.1 version of Airflow installed from PyPI +uses constraints from ``constraints-2.0.1`` tag). However in some cases - when installing airflow from +GitHub for example - you have to manually specify the version of constraints used, otherwise +it will default to the latest version of the constraints which might not be compatible with the +version of Airflow you use. + +You can also download any version of Airflow constraints and adapt it with your own set of +constraints and manually set your own versions of dependencies in your own constraints and use the version +of constraints that you manually prepared. + +You can read more about constraints in the documentation of the +`Installation <http://airflow.apache.org/docs/apache-airflow/stable/installation.html#constraints-files>`_ + +Examples of image customizing +----------------------------- + +.. _image-build-pypi: + + +Building from PyPI packages +........................... + +This is the basic way of building the custom images from sources. + +The following example builds the production image in version ``3.6`` with latest PyPI-released Airflow, +with default set of Airflow extras and dependencies. The ``2.0.1`` constraints are used automatically. -The advantage of this method is that it produces optimized image even if you need some compile-time -dependencies that are not needed in the final image. You need to use Airflow Sources to build such images -from the `official distribution folder of Apache Airflow <https://downloads.apache.org/airflow/>`_ for the -released versions, or checked out from the GitHub project if you happen to do it from git sources. +.. exampleinclude:: docker-examples/customizing/stable-airflow.sh + :language: bash + :start-after: [START build] + :end-before: [END build] -The easiest way to build the image is to use ``breeze`` script, but you can also build such customized -image by running appropriately crafted docker build in which you specify all the ``build-args`` -that you need to add to customize it. You can read about all the args and ways you can build the image -in :doc:`build-arg-ref`. +The following example builds the production image in version ``3.7`` with default extras from ``2.0.1`` PyPI +package. The ``2.0.1`` constraints are used automatically. -Here just a few examples are presented which should give you general understanding of what you can customize. +.. exampleinclude:: docker-examples/customizing/pypi-selected-version.sh + :language: bash + :start-after: [START build] + :end-before: [END build] -This builds production image in version 3.6 with default extras from the local sources (master version -of 2.0 currently): +The following example builds the production image in version ``3.8`` with additional airflow extras +(``mssql,hdfs``) from ``2.0.1`` PyPI package, and additional dependency (``oauth2client``). -.. code-block:: bash +.. exampleinclude:: docker-examples/customizing/pypi-extras-and-deps.sh + :language: bash + :start-after: [START build] + :end-before: [END build] - docker build . -This builds the production image in version 3.7 with default extras from 2.0.1 tag and -constraints taken from constraints-2-0 branch in GitHub. +The following example adds ``mpi4py`` package which requires both ``build-essential`` and ``mpi compiler``. -.. code-block:: bash +.. exampleinclude:: docker-examples/customizing/add-build-essential-custom.sh + :language: bash + :start-after: [START build] + :end-before: [END build] - docker build . \ - --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \ - --build-arg PYTHON_MAJOR_MINOR_VERSION=3.7 \ - --build-arg AIRFLOW_INSTALLATION_METHOD="https://github.com/apache/airflow/archive/2.0.1.tar.gz#egg=apache-airflow" \ - --build-arg AIRFLOW_CONSTRAINTS_REFERENCE="constraints-2-0" \ - --build-arg AIRFLOW_BRANCH="v1-10-test" \ - --build-arg AIRFLOW_SOURCES_FROM="empty" \ - --build-arg AIRFLOW_SOURCES_TO="/empty" +The above image is equivalent of the ""extended" image from previous chapter but it's size is only Review comment: ```suggestion The above image is equivalent of the "extended" image from previous chapter but it's size is only ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
