mik-laj commented on a change in pull request #14911:
URL: https://github.com/apache/airflow/pull/14911#discussion_r599148180



##########
File path: docs/docker-stack/build.rst
##########
@@ -41,210 +80,322 @@ Debian dependencies with ``apt`` or PyPI dependencies 
with ``pip install`` or an
 You should be aware, about a few things:
 
 * The production image of airflow uses "airflow" user, so if you want to add 
some of the tools
-  as ``root`` user, you need to switch to it with ``USER`` directive of the 
Dockerfile. Also you
-  should remember about following the
+  as ``root`` user, you need to switch to it with ``USER`` directive of the 
Dockerfile and switch back to
+  ``airflow`` user when you are done. Also you should remember about following 
the
   `best practises of Dockerfiles 
<https://docs.docker.com/develop/develop-images/dockerfile_best-practices/>`_
   to make sure your image is lean and small.
 
-  .. code-block:: dockerfile
+* The PyPI dependencies in Apache Airflow are installed in the user library, 
of the "airflow" user, so
+  PIP packages are installed to ~/.local folder as if the ``--user`` flag was 
specified when running PIP.
+  Note also that using ``--no-cache-dir`` is a good idea that can help to make 
your image smaller.
 
-    FROM apache/airflow:2.0.1
-    USER root
-    RUN apt-get update \
-      && apt-get install -y --no-install-recommends \
-             my-awesome-apt-dependency-to-add \
-      && apt-get autoremove -yqq --purge \
-      && apt-get clean \
-      && rm -rf /var/lib/apt/lists/*
-    USER airflow
+* If your apt, or PyPI dependencies require some of the ``build-essential`` or 
other packages that need
+  to compile your python dependencies, then your best choice is to follow the 
"Customize the image" route,
+  because you can build a highly-optimized (for size) image this way. However 
it requires to checkout sources
+  of Apache Airflow, so you might still want to choose to add 
``build-essential`` to your image,
+  even if your image will be significantly bigger.
 
+* You can also embed your dags in the image by simply adding them with COPY 
directive of Airflow.
+  The DAGs in production image are in ``/opt/airflow/dags`` folder.
 
-* PyPI dependencies in Apache Airflow are installed in the user library, of 
the "airflow" user, so
-  you need to install them with the ``--user`` flag and WITHOUT switching to 
airflow user. Note also
-  that using --no-cache-dir is a good idea that can help to make your image 
smaller.
+* You can build your image without any need for Airflow sources. It is enough 
that you place the
+  ``Dockerfile`` and any files that are referred to (such as Dag files) in a 
separate directory and run
+  a command ``docker build . --tag my-image:my-tag`` (where ``my-image`` is 
the name you want to name it
+  and ``my-tag`` is the tag you want to tag the image with.
 
-  .. code-block:: dockerfile
+.. note::
+  As of 2.0.1 image the ``--user`` flag is turned on by default by setting 
``PIP_USER`` environment variable
+  to ``true``. This can be disabled by un-setting the variable or by setting 
it to ``false``. In the
+  2.0.0 image you had to add the ``--user`` flag as ``pip install --user`` 
command.
 
-    FROM apache/airflow:2.0.1
-    RUN pip install --no-cache-dir --user my-awesome-pip-dependency-to-add
+Examples of image extending
+---------------------------
 
-* As of 2.0.1 image the ``--user`` flag is turned on by default by setting 
``PIP_USER`` environment variable
-  to ``true``. This can be disabled by un-setting the variable or by setting 
it to ``false``.
+An ``apt`` package example
+..........................
 
+The following example adds ``vim`` to the airflow image.
 
-* If your apt, or PyPI dependencies require some of the build-essentials, then 
your best choice is
-  to follow the "Customize the image" route. However it requires to checkout 
sources of Apache Airflow,
-  so you might still want to choose to add build essentials to your image, 
even if your image will
-  be significantly bigger.
+.. exampleinclude:: docker-examples/extending/add-apt-packages/Dockerfile
+    :language: Dockerfile
+    :start-after: [START Dockerfile]
+    :end-before: [END Dockerfile]
 
-  .. code-block:: dockerfile
+A ``PyPI`` package example
+..........................
 
-    FROM apache/airflow:2.0.1
-    USER root
-    RUN apt-get update \
-      && apt-get install -y --no-install-recommends \
-             build-essential my-awesome-apt-dependency-to-add \
-      && apt-get autoremove -yqq --purge \
-      && apt-get clean \
-      && rm -rf /var/lib/apt/lists/*
-    USER airflow
-    RUN pip install --no-cache-dir --user my-awesome-pip-dependency-to-add
+The following example adds ``lxml`` python package from PyPI to the image.
 
-* You can also embed your dags in the image by simply adding them with COPY 
directive of Airflow.
-  The DAGs in production image are in ``/opt/airflow/dags`` folder.
+.. exampleinclude:: docker-examples/extending/add-pypi-packages/Dockerfile
+    :language: Dockerfile
+    :start-after: [START Dockerfile]
+    :end-before: [END Dockerfile]
+
+A ``build-essential`` requiring package example
+...............................................
+
+The following example adds ``mpi4py`` package which requires both 
``build-essential`` and ``mpi compiler``.
+
+.. exampleinclude:: 
docker-examples/extending/add-build-essential-extend/Dockerfile
+    :language: Dockerfile
+    :start-after: [START Dockerfile]
+    :end-before: [END Dockerfile]
+
+The size of this image is ~ 1.1 GB when build. As you will see further, you 
can achieve 20% reduction in
+size of the image in case you use "Customizing" rather than "Extending" the 
image.
+
+DAG embedding example
+.....................
+
+The following example adds ``test_dag.py`` to your image in the 
``/opt/airflow/dags`` folder.
+
+.. exampleinclude:: docker-examples/extending/embedding-dags/Dockerfile
+    :language: Dockerfile
+    :start-after: [START Dockerfile]
+    :end-before: [END Dockerfile]
+
+
+.. exampleinclude:: docker-examples/extending/embedding-dags/test_dag.py
+    :language: Python
+    :start-after: [START dag]
+    :end-before: [END dag]
 
 Customizing the image
 ---------------------
 
-Customizing the image is an alternative way of adding your own dependencies to 
the image - better
-suited to prepare optimized production images.
+Customizing the image is an optimized way of adding your own dependencies to 
the image - better
+suited to prepare highly optimized (for size) production images, especially 
when you have dependencies
+that require to be compiled before installing (such as ``mpi4py``).
+
+It also allows more sophisticated usages, needed by "Power-users" - for 
example using forked version
+of Airflow, or building the images from security-vetted sources.
+
+The big advantage of this method is that it produces optimized image even if 
you need some compile-time
+dependencies that are not needed in the final image.
+
+The disadvantage is that you need to use Airflow Sources to build such images 
from the
+`official distribution repository of Apache Airflow 
<https://downloads.apache.org/airflow/>`_ for the
+released versions, or from the checked out sources (using release tags or main 
branches) in the
+`Airflow GitHub Project <https://github.com/apache/airflow>`_ or from your own 
fork
+if you happen to do maintain your own fork of Airflow.
+
+Another disadvantage is that the pattern of building Docker images with 
``--build-arg`` is less familiar
+to developers of such images. However it is quite well-known to "power-users". 
That's why the
+customizing flow is better suited for those users who have more familiarity 
and have more custom
+requirements.
+
+The image also usually builds much longer than the equivalent "Extended" image 
because instead of
+extending the layers that are already coming from the base image, it rebuilds 
the layers needed
+to add extra dependencies needed at early stages of image building.
+
+When customizing the image you can choose a number of options how you install 
Airflow:
+
+   * From the PyPI releases (default)
+   * From the custom installation sources - using additional/replacing the 
original apt or PyPI repositories
+   * From local sources. This is used mostly during development.
+   * From tag or branch, or specific commit from a GitHub Airflow repository 
(or fork). This is particularly
+     useful when you build image for a custom version of Airflow that you keep 
in your fork and you do not
+     want to release the custom Airflow version to PyPI.
+   * From locally stored binary packages for Airflow, Airflow Providers and 
other dependencies. This is
+     particularly useful if you want to build Airflow in a highly-secure 
environment where all such packages
+     must be vetted by your security team and stored in your private artifact 
registry. This also
+     allows to build airflow image in an air-gaped environment.
+   * Side note. Building ``Airflow`` in an ``air-gaped`` environment sounds 
pretty funny, doesn't it?
+
+You can also add a range of customizations while building the image:
+
+   * base python image you use for Airflow
+   * version of Airflow to install
+   * extras to install for Airflow (or even removing some default extras)
+   * additional apt/python dependencies to use while building Airflow (DEV 
dependencies)
+   * additional apt/python dependencies to install for runtime version of 
Airflow (RUNTIME dependencies)
+   * additional commands and variables to set if needed during building or 
preparing Airflow runtime
+   * choosing constraint file to use when installing Airflow
+
+Additional explanation is needed for the last point. Airflow uses constraints 
to make sure
+that it can be predictably installed, even if some new versions of Airflow 
dependencies are
+released (or even dependencies of our dependencies!). The docker image and 
accompanying scripts
+usually determine automatically the right versions of constraints to be used 
based on the Airflow
+version installed and Python version. For example 2.0.1 version of Airflow 
installed from PyPI
+uses constraints from ``constraints-2.0.1`` tag). However in some cases - when 
installing airflow from
+GitHub for example - you have to manually specify the version of constraints 
used, otherwise
+it will default to the latest version of the constraints which might not be 
compatible with the
+version of Airflow you use.
+
+You can also download any version of Airflow constraints and adapt it with 
your own set of
+constraints and manually set your own versions of dependencies in your own 
constraints and use the version
+of constraints that you manually prepared.
+
+You can read more about constraints in the documentation of the
+`Installation 
<http://airflow.apache.org/docs/apache-airflow/stable/installation.html#constraints-files>`_
+
+Examples of image customizing
+-----------------------------
+
+.. _image-build-pypi:
+
+
+Building from PyPI packages
+...........................
+
+This is the basic way of building the custom images from sources.
+
+The following example builds the production image in version ``3.6`` with 
latest PyPI-released Airflow,
+with default set of Airflow extras and dependencies. The ``2.0.1`` constraints 
are used automatically.
 
-The advantage of this method is that it produces optimized image even if you 
need some compile-time
-dependencies that are not needed in the final image. You need to use Airflow 
Sources to build such images
-from the `official distribution folder of Apache Airflow 
<https://downloads.apache.org/airflow/>`_ for the
-released versions, or checked out from the GitHub project if you happen to do 
it from git sources.
+.. exampleinclude:: docker-examples/customizing/stable-airflow.sh
+    :language: bash
+    :start-after: [START build]
+    :end-before: [END build]
 
-The easiest way to build the image is to use ``breeze`` script, but you can 
also build such customized
-image by running appropriately crafted docker build in which you specify all 
the ``build-args``
-that you need to add to customize it. You can read about all the args and ways 
you can build the image
-in :doc:`build-arg-ref`.
+The following example builds the production image in version ``3.7`` with 
default extras from ``2.0.1`` PyPI
+package. The ``2.0.1`` constraints are used automatically.
 
-Here just a few examples are presented which should give you general 
understanding of what you can customize.
+.. exampleinclude:: docker-examples/customizing/pypi-selected-version.sh
+    :language: bash
+    :start-after: [START build]
+    :end-before: [END build]
 
-This builds production image in version 3.6 with default extras from the local 
sources (master version
-of 2.0 currently):
+The following example builds the production image in version ``3.8`` with 
additional airflow extras
+(``mssql,hdfs``) from ``2.0.1`` PyPI package, and additional dependency 
(``oauth2client``).
 
-.. code-block:: bash
+.. exampleinclude:: docker-examples/customizing/pypi-extras-and-deps.sh
+    :language: bash
+    :start-after: [START build]
+    :end-before: [END build]
 
-  docker build .
 
-This builds the production image in version 3.7 with default extras from 2.0.1 
tag and
-constraints taken from constraints-2-0 branch in GitHub.
+The following example adds ``mpi4py`` package which requires both 
``build-essential`` and ``mpi compiler``.
 
-.. code-block:: bash
+.. exampleinclude:: docker-examples/customizing/add-build-essential-custom.sh
+    :language: bash
+    :start-after: [START build]
+    :end-before: [END build]
 
-  docker build . \
-    --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \
-    --build-arg PYTHON_MAJOR_MINOR_VERSION=3.7 \
-    --build-arg 
AIRFLOW_INSTALLATION_METHOD="https://github.com/apache/airflow/archive/2.0.1.tar.gz#egg=apache-airflow";
 \
-    --build-arg AIRFLOW_CONSTRAINTS_REFERENCE="constraints-2-0" \
-    --build-arg AIRFLOW_BRANCH="v1-10-test" \
-    --build-arg AIRFLOW_SOURCES_FROM="empty" \
-    --build-arg AIRFLOW_SOURCES_TO="/empty"
+The above image is equivalent of the ""extended" image from previous chapter 
but it's size is only

Review comment:
       ```suggestion
   The above image is equivalent of the "extended" image from previous chapter 
but it's size is only
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to