mik-laj commented on a change in pull request #14911:
URL: https://github.com/apache/airflow/pull/14911#discussion_r599149214



##########
File path: docs/docker-stack/build.rst
##########
@@ -41,210 +80,322 @@ Debian dependencies with ``apt`` or PyPI dependencies 
with ``pip install`` or an
 You should be aware, about a few things:
 
 * The production image of airflow uses "airflow" user, so if you want to add 
some of the tools
-  as ``root`` user, you need to switch to it with ``USER`` directive of the 
Dockerfile. Also you
-  should remember about following the
+  as ``root`` user, you need to switch to it with ``USER`` directive of the 
Dockerfile and switch back to
+  ``airflow`` user when you are done. Also you should remember about following 
the
   `best practises of Dockerfiles 
<https://docs.docker.com/develop/develop-images/dockerfile_best-practices/>`_
   to make sure your image is lean and small.
 
-  .. code-block:: dockerfile
+* The PyPI dependencies in Apache Airflow are installed in the user library, 
of the "airflow" user, so
+  PIP packages are installed to ~/.local folder as if the ``--user`` flag was 
specified when running PIP.
+  Note also that using ``--no-cache-dir`` is a good idea that can help to make 
your image smaller.
 
-    FROM apache/airflow:2.0.1
-    USER root
-    RUN apt-get update \
-      && apt-get install -y --no-install-recommends \
-             my-awesome-apt-dependency-to-add \
-      && apt-get autoremove -yqq --purge \
-      && apt-get clean \
-      && rm -rf /var/lib/apt/lists/*
-    USER airflow
+* If your apt, or PyPI dependencies require some of the ``build-essential`` or 
other packages that need
+  to compile your python dependencies, then your best choice is to follow the 
"Customize the image" route,
+  because you can build a highly-optimized (for size) image this way. However 
it requires to checkout sources
+  of Apache Airflow, so you might still want to choose to add 
``build-essential`` to your image,
+  even if your image will be significantly bigger.
 
+* You can also embed your dags in the image by simply adding them with COPY 
directive of Airflow.
+  The DAGs in production image are in ``/opt/airflow/dags`` folder.
 
-* PyPI dependencies in Apache Airflow are installed in the user library, of 
the "airflow" user, so
-  you need to install them with the ``--user`` flag and WITHOUT switching to 
airflow user. Note also
-  that using --no-cache-dir is a good idea that can help to make your image 
smaller.
+* You can build your image without any need for Airflow sources. It is enough 
that you place the
+  ``Dockerfile`` and any files that are referred to (such as Dag files) in a 
separate directory and run
+  a command ``docker build . --tag my-image:my-tag`` (where ``my-image`` is 
the name you want to name it
+  and ``my-tag`` is the tag you want to tag the image with.
 
-  .. code-block:: dockerfile
+.. note::
+  As of 2.0.1 image the ``--user`` flag is turned on by default by setting 
``PIP_USER`` environment variable
+  to ``true``. This can be disabled by un-setting the variable or by setting 
it to ``false``. In the
+  2.0.0 image you had to add the ``--user`` flag as ``pip install --user`` 
command.
 
-    FROM apache/airflow:2.0.1
-    RUN pip install --no-cache-dir --user my-awesome-pip-dependency-to-add
+Examples of image extending
+---------------------------
 
-* As of 2.0.1 image the ``--user`` flag is turned on by default by setting 
``PIP_USER`` environment variable
-  to ``true``. This can be disabled by un-setting the variable or by setting 
it to ``false``.
+An ``apt`` package example
+..........................
 
+The following example adds ``vim`` to the airflow image.
 
-* If your apt, or PyPI dependencies require some of the build-essentials, then 
your best choice is
-  to follow the "Customize the image" route. However it requires to checkout 
sources of Apache Airflow,
-  so you might still want to choose to add build essentials to your image, 
even if your image will
-  be significantly bigger.
+.. exampleinclude:: docker-examples/extending/add-apt-packages/Dockerfile
+    :language: Dockerfile
+    :start-after: [START Dockerfile]
+    :end-before: [END Dockerfile]
 
-  .. code-block:: dockerfile
+A ``PyPI`` package example
+..........................
 
-    FROM apache/airflow:2.0.1
-    USER root
-    RUN apt-get update \
-      && apt-get install -y --no-install-recommends \
-             build-essential my-awesome-apt-dependency-to-add \
-      && apt-get autoremove -yqq --purge \
-      && apt-get clean \
-      && rm -rf /var/lib/apt/lists/*
-    USER airflow
-    RUN pip install --no-cache-dir --user my-awesome-pip-dependency-to-add
+The following example adds ``lxml`` python package from PyPI to the image.
 
-* You can also embed your dags in the image by simply adding them with COPY 
directive of Airflow.
-  The DAGs in production image are in ``/opt/airflow/dags`` folder.
+.. exampleinclude:: docker-examples/extending/add-pypi-packages/Dockerfile
+    :language: Dockerfile
+    :start-after: [START Dockerfile]
+    :end-before: [END Dockerfile]
+
+A ``build-essential`` requiring package example
+...............................................
+
+The following example adds ``mpi4py`` package which requires both 
``build-essential`` and ``mpi compiler``.
+
+.. exampleinclude:: 
docker-examples/extending/add-build-essential-extend/Dockerfile
+    :language: Dockerfile
+    :start-after: [START Dockerfile]
+    :end-before: [END Dockerfile]
+
+The size of this image is ~ 1.1 GB when build. As you will see further, you 
can achieve 20% reduction in
+size of the image in case you use "Customizing" rather than "Extending" the 
image.
+
+DAG embedding example
+.....................
+
+The following example adds ``test_dag.py`` to your image in the 
``/opt/airflow/dags`` folder.
+
+.. exampleinclude:: docker-examples/extending/embedding-dags/Dockerfile
+    :language: Dockerfile
+    :start-after: [START Dockerfile]
+    :end-before: [END Dockerfile]
+
+
+.. exampleinclude:: docker-examples/extending/embedding-dags/test_dag.py
+    :language: Python
+    :start-after: [START dag]
+    :end-before: [END dag]
 
 Customizing the image
 ---------------------
 
-Customizing the image is an alternative way of adding your own dependencies to 
the image - better
-suited to prepare optimized production images.
+Customizing the image is an optimized way of adding your own dependencies to 
the image - better
+suited to prepare highly optimized (for size) production images, especially 
when you have dependencies
+that require to be compiled before installing (such as ``mpi4py``).
+
+It also allows more sophisticated usages, needed by "Power-users" - for 
example using forked version
+of Airflow, or building the images from security-vetted sources.
+
+The big advantage of this method is that it produces optimized image even if 
you need some compile-time
+dependencies that are not needed in the final image.
+
+The disadvantage is that you need to use Airflow Sources to build such images 
from the
+`official distribution repository of Apache Airflow 
<https://downloads.apache.org/airflow/>`_ for the
+released versions, or from the checked out sources (using release tags or main 
branches) in the
+`Airflow GitHub Project <https://github.com/apache/airflow>`_ or from your own 
fork
+if you happen to do maintain your own fork of Airflow.
+
+Another disadvantage is that the pattern of building Docker images with 
``--build-arg`` is less familiar
+to developers of such images. However it is quite well-known to "power-users". 
That's why the
+customizing flow is better suited for those users who have more familiarity 
and have more custom
+requirements.
+
+The image also usually builds much longer than the equivalent "Extended" image 
because instead of
+extending the layers that are already coming from the base image, it rebuilds 
the layers needed
+to add extra dependencies needed at early stages of image building.
+
+When customizing the image you can choose a number of options how you install 
Airflow:
+
+   * From the PyPI releases (default)
+   * From the custom installation sources - using additional/replacing the 
original apt or PyPI repositories
+   * From local sources. This is used mostly during development.
+   * From tag or branch, or specific commit from a GitHub Airflow repository 
(or fork). This is particularly
+     useful when you build image for a custom version of Airflow that you keep 
in your fork and you do not
+     want to release the custom Airflow version to PyPI.
+   * From locally stored binary packages for Airflow, Airflow Providers and 
other dependencies. This is
+     particularly useful if you want to build Airflow in a highly-secure 
environment where all such packages
+     must be vetted by your security team and stored in your private artifact 
registry. This also
+     allows to build airflow image in an air-gaped environment.
+   * Side note. Building ``Airflow`` in an ``air-gaped`` environment sounds 
pretty funny, doesn't it?
+
+You can also add a range of customizations while building the image:
+
+   * base python image you use for Airflow
+   * version of Airflow to install
+   * extras to install for Airflow (or even removing some default extras)
+   * additional apt/python dependencies to use while building Airflow (DEV 
dependencies)
+   * additional apt/python dependencies to install for runtime version of 
Airflow (RUNTIME dependencies)
+   * additional commands and variables to set if needed during building or 
preparing Airflow runtime
+   * choosing constraint file to use when installing Airflow
+
+Additional explanation is needed for the last point. Airflow uses constraints 
to make sure
+that it can be predictably installed, even if some new versions of Airflow 
dependencies are
+released (or even dependencies of our dependencies!). The docker image and 
accompanying scripts
+usually determine automatically the right versions of constraints to be used 
based on the Airflow
+version installed and Python version. For example 2.0.1 version of Airflow 
installed from PyPI
+uses constraints from ``constraints-2.0.1`` tag). However in some cases - when 
installing airflow from
+GitHub for example - you have to manually specify the version of constraints 
used, otherwise
+it will default to the latest version of the constraints which might not be 
compatible with the
+version of Airflow you use.
+
+You can also download any version of Airflow constraints and adapt it with 
your own set of
+constraints and manually set your own versions of dependencies in your own 
constraints and use the version
+of constraints that you manually prepared.
+
+You can read more about constraints in the documentation of the
+`Installation 
<http://airflow.apache.org/docs/apache-airflow/stable/installation.html#constraints-files>`_
+
+Examples of image customizing
+-----------------------------
+
+.. _image-build-pypi:
+
+
+Building from PyPI packages
+...........................
+
+This is the basic way of building the custom images from sources.
+
+The following example builds the production image in version ``3.6`` with 
latest PyPI-released Airflow,
+with default set of Airflow extras and dependencies. The ``2.0.1`` constraints 
are used automatically.
 
-The advantage of this method is that it produces optimized image even if you 
need some compile-time
-dependencies that are not needed in the final image. You need to use Airflow 
Sources to build such images
-from the `official distribution folder of Apache Airflow 
<https://downloads.apache.org/airflow/>`_ for the
-released versions, or checked out from the GitHub project if you happen to do 
it from git sources.
+.. exampleinclude:: docker-examples/customizing/stable-airflow.sh
+    :language: bash
+    :start-after: [START build]
+    :end-before: [END build]
 
-The easiest way to build the image is to use ``breeze`` script, but you can 
also build such customized
-image by running appropriately crafted docker build in which you specify all 
the ``build-args``
-that you need to add to customize it. You can read about all the args and ways 
you can build the image
-in :doc:`build-arg-ref`.
+The following example builds the production image in version ``3.7`` with 
default extras from ``2.0.1`` PyPI
+package. The ``2.0.1`` constraints are used automatically.
 
-Here just a few examples are presented which should give you general 
understanding of what you can customize.
+.. exampleinclude:: docker-examples/customizing/pypi-selected-version.sh
+    :language: bash
+    :start-after: [START build]
+    :end-before: [END build]
 
-This builds production image in version 3.6 with default extras from the local 
sources (master version
-of 2.0 currently):
+The following example builds the production image in version ``3.8`` with 
additional airflow extras
+(``mssql,hdfs``) from ``2.0.1`` PyPI package, and additional dependency 
(``oauth2client``).
 
-.. code-block:: bash
+.. exampleinclude:: docker-examples/customizing/pypi-extras-and-deps.sh
+    :language: bash
+    :start-after: [START build]
+    :end-before: [END build]
 
-  docker build .
 
-This builds the production image in version 3.7 with default extras from 2.0.1 
tag and
-constraints taken from constraints-2-0 branch in GitHub.
+The following example adds ``mpi4py`` package which requires both 
``build-essential`` and ``mpi compiler``.
 
-.. code-block:: bash
+.. exampleinclude:: docker-examples/customizing/add-build-essential-custom.sh
+    :language: bash
+    :start-after: [START build]
+    :end-before: [END build]
 
-  docker build . \
-    --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \
-    --build-arg PYTHON_MAJOR_MINOR_VERSION=3.7 \
-    --build-arg 
AIRFLOW_INSTALLATION_METHOD="https://github.com/apache/airflow/archive/2.0.1.tar.gz#egg=apache-airflow";
 \
-    --build-arg AIRFLOW_CONSTRAINTS_REFERENCE="constraints-2-0" \
-    --build-arg AIRFLOW_BRANCH="v1-10-test" \
-    --build-arg AIRFLOW_SOURCES_FROM="empty" \
-    --build-arg AIRFLOW_SOURCES_TO="/empty"
+The above image is equivalent of the ""extended" image from previous chapter 
but it's size is only
+874 MB. Comparing to 1.1 GB of the "extended image" this is about 230 MB less, 
so you can achieve ~20%
+improvement in size of the image by using "customization" vs. extension. The 
saving can increase in case you
+have more complex dependencies to build.
 
-This builds the production image in version 3.7 with default extras from 2.0.1 
PyPI package and
-constraints taken from 2.0.1 tag in GitHub and pre-installed pip dependencies 
from the top
-of ``v1-10-test`` branch.
 
-.. code-block:: bash
+.. _image-build-optimized:
 
-  docker build . \
-    --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \
-    --build-arg PYTHON_MAJOR_MINOR_VERSION=3.7 \
-    --build-arg AIRFLOW_INSTALLATION_METHOD="apache-airflow" \
-    --build-arg AIRFLOW_VERSION="2.0.1" \
-    --build-arg AIRFLOW_VERSION_SPECIFICATION="==2.0.1" \
-    --build-arg AIRFLOW_BRANCH="v1-10-test" \
-    --build-arg AIRFLOW_CONSTRAINTS_REFERENCE="constraints-2.0.1" \
-    --build-arg AIRFLOW_SOURCES_FROM="empty" \
-    --build-arg AIRFLOW_SOURCES_TO="/empty"
+Building optimized images
+.........................
 
-This builds the production image in version 3.7 with additional airflow extras 
from 2.0.1 PyPI package and
-additional python dependencies and pre-installed pip dependencies from 2.0.1 
tagged constraints.
+The following example the production image in version ``3.6`` with additional 
airflow extras from ``2.0.1``
+PyPI package but it includes additional apt dev and runtime dependencies.
 
-.. code-block:: bash
+The dev dependencies are those that require ``build-essential`` and usually 
need to involve recompiling
+of some python dependencies so those packages might require some additional 
DEV dependencies to be
+present during recompilation. Those packages are not needed at runtime, so we 
only install them for the
+"build" time. They are not installed in the final image, thus producing much 
smaller images.
+In this case pandas requires recompilation so it also needs gcc and g++ as dev 
APT dependencies.
+The ``jre-headless`` does not require recompiling so it can be installed as 
the runtime APT dependency.
 
-  docker build . \
-    --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \
-    --build-arg PYTHON_MAJOR_MINOR_VERSION=3.7 \
-    --build-arg AIRFLOW_INSTALLATION_METHOD="apache-airflow" \
-    --build-arg AIRFLOW_VERSION="2.0.1" \
-    --build-arg AIRFLOW_VERSION_SPECIFICATION="==2.0.1" \
-    --build-arg AIRFLOW_BRANCH="v1-10-test" \
-    --build-arg AIRFLOW_CONSTRAINTS_REFERENCE="constraints-2.0.1" \
-    --build-arg AIRFLOW_SOURCES_FROM="empty" \
-    --build-arg AIRFLOW_SOURCES_TO="/empty" \
-    --build-arg ADDITIONAL_AIRFLOW_EXTRAS="mssql,hdfs" \
-    --build-arg ADDITIONAL_PYTHON_DEPS="sshtunnel oauth2client"
+.. exampleinclude:: docker-examples/customizing/pypi-dev-runtime-deps.sh
+    :language: bash
+    :start-after: [START build]
+    :end-before: [END build]
 
-This builds the production image in version 3.7 with additional airflow extras 
from 2.0.1 PyPI package and
-additional apt dev and runtime dependencies.
+.. _image-build-github:
 
-.. code-block:: bash
 
-  docker build . \
-    --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \
-    --build-arg PYTHON_MAJOR_MINOR_VERSION=3.7 \
-    --build-arg AIRFLOW_INSTALLATION_METHOD="apache-airflow" \
-    --build-arg AIRFLOW_VERSION="2.0.1" \
-    --build-arg AIRFLOW_VERSION_SPECIFICATION="==2.0.1" \
-    --build-arg AIRFLOW_CONSTRAINTS_REFERENCE="constraints-2-0" \
-    --build-arg AIRFLOW_SOURCES_FROM="empty" \
-    --build-arg AIRFLOW_SOURCES_TO="/empty" \
-    --build-arg ADDITIONAL_AIRFLOW_EXTRAS="jdbc" \
-    --build-arg ADDITIONAL_PYTHON_DEPS="pandas" \
-    --build-arg ADDITIONAL_DEV_APT_DEPS="gcc g++" \
-    --build-arg ADDITIONAL_RUNTIME_APT_DEPS="default-jre-headless" \
-    --tag my-image
+Building from GitHub
+....................
 
+This method is usually used for development purpose. But in case you have your 
own fork you can point
+it to your forked version of source code without having to release it to PyPI. 
It is enough to have
+a branch or tag in your repository and use the tag or branch in the URL that 
you point the installation to.
 
-The same image can be built using ``breeze`` (it supports auto-completion of 
the options):
+In case of GitHyb builds you need to pass the constraints reference manually 
in case you want to use
+specific constraints, otherwise the default ``constraints-master`` is used.
 
-.. code-block:: bash
+The following example builds the production image in version ``3.7`` with 
default extras from the latest master version and
+constraints are taken from latest version of the constraints-master branch in 
GitHub.
 
-  ./breeze build-image \
-      --production-image  --python 3.7 --install-airflow-version=2.0.1 \
-      --additional-extras=jdbc --additional-python-deps="pandas" \
-      --additional-dev-apt-deps="gcc g++" 
--additional-runtime-apt-deps="default-jre-headless"
+.. exampleinclude:: docker-examples/customizing/github-master.sh
+    :language: bash
+    :start-after: [START build]
+    :end-before: [END build]
 
+The following example builds the production image with default extras from the
+latest ``v2-0-test`` version and constraints are taken from the latest version 
of
+the ``constraints-2-0`` branch in GitHub. Note that this command might fail 
occasionally as only
+the "released version" constraints when building a version and "master" 
constraints when building
+master are guaranteed to work.
+
+.. exampleinclude:: docker-examples/customizing/github-v2-0-test.sh
+    :language: bash
+    :start-after: [START build]
+    :end-before: [END build]
+
+You can also specify another repository to build from. If you also want to use 
different constraints
+repository source, you must specify it as additional 
``CONSTRAINTS_GITHUB_REPOSITORY`` build arg.
+
+The following example builds the production image using ``potiuk/airflow`` 
fork of Airflow and constraints
+are also downloaded from that repository.
+
+.. exampleinclude:: docker-examples/customizing/github-different-repository.sh
+    :language: bash
+    :start-after: [START build]
+    :end-before: [END build]
+
+.. _image-build-custom:
+
+Using custom installation sources
+.................................
 
 You can customize more aspects of the image - such as additional commands 
executed before apt dependencies
 are installed, or adding extra sources to install your dependencies from. You 
can see all the arguments
 described below but here is an example of rather complex command to customize 
the image
 based on example in `this comment 
<https://github.com/apache/airflow/issues/8605#issuecomment-690065621>`_:
 
-.. code-block:: bash
-
-  docker build . -f Dockerfile \
-    --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \
-    --build-arg PYTHON_MAJOR_MINOR_VERSION=3.7 \
-    --build-arg AIRFLOW_INSTALLATION_METHOD="apache-airflow" \
-    --build-arg AIRFLOW_VERSION="2.0.1" \
-    --build-arg AIRFLOW_VERSION_SPECIFICATION="==2.0.1" \
-    --build-arg AIRFLOW_CONSTRAINTS_REFERENCE="constraints-2-0" \
-    --build-arg AIRFLOW_SOURCES_FROM="empty" \
-    --build-arg AIRFLOW_SOURCES_TO="/empty" \
-    --build-arg ADDITIONAL_AIRFLOW_EXTRAS="slack" \
-    --build-arg ADDITIONAL_PYTHON_DEPS=" \
-        apache-airflow-providers-odbc \
-        azure-storage-blob \
-        sshtunnel \
-        google-api-python-client \
-        oauth2client \
-        beautifulsoup4 \
-        dateparser \
-        rocketchat_API \
-        typeform" \
-    --build-arg ADDITIONAL_DEV_APT_DEPS="msodbcsql17 unixodbc-dev g++" \
-    --build-arg ADDITIONAL_DEV_APT_COMMAND="curl 
https://packages.microsoft.com/keys/microsoft.asc | \
-    apt-key add --no-tty - && \
-    curl https://packages.microsoft.com/config/debian/10/prod.list > 
/etc/apt/sources.list.d/mssql-release.list" \
-    --build-arg ADDITIONAL_DEV_ENV_VARS="ACCEPT_EULA=Y" \
-    --build-arg ADDITIONAL_RUNTIME_APT_COMMAND="curl 
https://packages.microsoft.com/keys/microsoft.asc | \
-    apt-key add --no-tty - && \
-    curl https://packages.microsoft.com/config/debian/10/prod.list > 
/etc/apt/sources.list.d/mssql-release.list" \
-    --build-arg ADDITIONAL_RUNTIME_APT_DEPS="msodbcsql17 unixodbc git procps 
vim" \
-    --build-arg ADDITIONAL_RUNTIME_ENV_VARS="ACCEPT_EULA=Y" \
-    --tag my-image
-
-Customizing images in high security restricted environments
-...........................................................
+In case you need to use your custom PyPI package indexes, you can also 
customize PYPI sources used during
+image build by adding a ``docker-context-files``/``.pypirc`` file when 
building the image.
+This ``.pypirc`` will not be committed to the repository (it is added to 
.gitignore) and it will not be

Review comment:
       ```suggestion
   This ``.pypirc`` will not be committed to the repository (it is added to 
``.gitignore``) and it will not be
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to