potiuk commented on a change in pull request #7832: Add production image support URL: https://github.com/apache/airflow/pull/7832#discussion_r402517891
########## File path: IMAGES.rst ########## @@ -0,0 +1,427 @@ + .. Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + .. http://www.apache.org/licenses/LICENSE-2.0 + + .. Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +.. contents:: :local: + +Airflow docker images +===================== + +Airflow has two images (build from Dockerfiles): + +* CI image (Dockerfile.ci) - used for running tests and local development +* Production image (Dockerfile) - used to run production-ready Airflow installations + +Image naming conventions +======================== + +The images are named as follows: + +``apache/airflow:<BRANCH_OR_TAG>-python<PYTHON_MAJOR_MINOR_VERSION>[-ci][-manifest]`` + +where: + +* BRANCH_OR_TAG - branch or tag used when creating the image. Examples: master, v1-10-test, 1.10.10 + The ``master`` and ``v1-10-test`` labels are built from branches so they change over time. the 1.10.* and in + the future ``2.*`` labels are build from git tags and they are "fixed" once built. +* PYTHON_MAJOR_MINOR_VERSION - version of python used to build the image. Examples: 3.5, 3.7 +* The ``-ci`` suffix is added for CI images +* The ``-manifest`` is added for manifest images (see below for explanation of manifest images) + +Building docker images +====================== + +The easiest way to build those images is to use `<BREEZE.rst>`_. + +Note! Breeze by default builds production image from local sources. You can change it's behaviour by +providing ``--install-airflow-version`` parameter, where you can specify the +tag/branch used to download Airflow package from in github repository. You can +also change the repository itself by adding --dockerhub-user and --dockerhub-repo flag values. + +You can build the CI image using this command: + +.. code-block:: + + ./breeze build-image + +You can build production image using this command: + +.. code-block:: + + ./breeze build-image --production-image + +By adding ``--python <PYTHON_MAJOR_MINOR_VERSION>`` parameter you can build the +image version for the chosen python version. + +The images are build with default extras - different extras for CI and production image and you +can change the extras via the ``--extras`` parameters. You can see default extras used via +``./breeze flags``. + +For example if you want to build python 3.7 version of production image with +"all" extras installed you should run this command: + +.. code-block:: + + ./breeze build-image --python 3.7 --extras "all" --production-image + +The command that builds the CI image is optimized to minimize the time needed to rebuild the image when +the source code of Airflow evolves. This means that if you already have the image locally downloaded and +built, the scripts will determine whether the rebuild is needed in the first place. Then the scripts will +make sure that minimal number of steps are executed to rebuild parts of the image (for example, +PIP dependencies) and will give you an image consistent with the one used during Continuous Integration. + +The command that builds the production image is optimised for size of the image. + +In Breeze by default, the airflow is installed using local sources of Apache Airflow. + +You can also build production images from PIP packages via providing ``--install-airflow-version`` +parameter to Breeze: + +.. code-block:: + + ./breeze build-image --python 3.7 --extras=gcp --production-image --install-airflow-version=1.10.9 + +This will build the image using command similar to: + +.. code-block:: + + pip install apache-airflow[gcp]==1.10.9 \ + --constraint https://raw.githubusercontent.com/apache/airflow/v1-10-test/requirements/requirements-python3.7.txt + +This will also download entrypoint script from https://raw.githubusercontent.com/apache/airflow/v1-10-test/entrypoint.sh +url. It is important so that we have matching version of the requirements. + +The requirement files and entrypoint only appeared in version 1.10.10 of airflow so if you install +an earlier version - both constraint and requirements should point to 1.10.10 version. + +You can also build production images from specific Git version via providing ``--install-airflow-reference`` +parameter to Breeze: + +.. code-block:: + + pip install https://github.com/apache/airflow/archive/<tag>.tar.gz#egg=apache-airflow \ + --constraint https://raw.githubusercontent.com/apache/airflow/<tag>/requirements/requirements-python3.7.txt + +This will also Download entrypoint script from ``https://raw.githubusercontent.com/apache/airflow/<tag>/entrypoint.sh`` +url. + +Technical details of Airflow images +=================================== + +The CI image is used by Breeze as shell image but it is also used during CI builds on Travis. +The image is single segment image that contains Airflow installation with "all" dependencies installed. +It is optimised for rebuild speed (AIRFLOW_CONTAINER_CI_OPTIMISED_BUILD flag set to "true"). +It installs PIP dependencies from the current branch first - so that any changes in setup.py do not trigger +reinstalling of all dependencies. There is a second step of installation that re-installs the dependencies +from the latest sources so that we are sure that latest dependencies are installed. + +The production image is a multi-segment image. The first segment "airflow-build-image" contains all the +build essentials and related dependencies that allow to install airflow locally. By default the image is +build from a released version of Airflow from Github, but by providing some extra arguments you can also +build it from local sources. This is particularly useful in CI environment where we are using the image +to run Kubernetes tests. See below for the list of arguments that should be provided to build +production image from the local sources. + +Manually building the images +---------------------------- + +You can build the default production image with standard ``docker build`` command but they will only build +default versions of the image and will not use the dockerhub versions of images as cache. + + +CI images +......... + +The following build arguments (``--build-arg`` in docker build command) can be used for CI images: + ++------------------------------------------+------------------------------------------+------------------------------------------+ +| Build argument | Default value | Description | ++==========================================+==========================================+==========================================+ +| ``PYTHON_BASE_IMAGE`` | ``python:3.6-slim-buster`` | Base python image | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``AIRFLOW_VERSION`` | ``2.0.0.dev0`` | version of Airflow | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``PYTHON_MAJOR_MINOR_VERSION`` | ``3.6`` | major/minor version of Python (should | +| | | match base image) | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``DEPENDENCIES_EPOCH_NUMBER`` | ``2`` | increasing this number will reinstall | +| | | all apt dependencies | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``KUBECTL_VERSION`` | ``v1.15.3`` | version of kubectl installed | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``KIND_VERSION`` | ``v0.6.1`` | version of kind installed | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``PIP_NO_CACHE_DIR`` | ``true`` | if true, then no pip cache will be | +| | | stored | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``PIP_VERSION`` | ``19.0.2`` | version of PIP to use | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``HOME`` | ``/root`` | Home directory of the root user (CI | +| | | image has root user as default) | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``AIRFLOW_HOME`` | ``/root/airflow`` | Airflow’s HOME (that’s where logs and | +| | | sqlite databases are stored) | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``AIRFLOW_SOURCES`` | ``/opt/airflow`` | Mounted sources of Airflow | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``PIP_DEPENDENCIES_EPOCH_NUMBER`` | ``3`` | increasing that number will reinstall | +| | | all PIP dependencies | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``CASS_DRIVER_NO_CYTHON`` | ``1`` | if set to 1 no CYTHON compilation is | +| | | done for cassandra driver (much faster) | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``AIRFLOW_CONTAINER_CI_OPTIMISED_BUILD`` | ``true`` | if set then PIP dependencies are | +| | | installed from repo first before they | +| | | are reinstalled from local sources. This | +| | | allows for incremental faster builds | +| | | when requirements change | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``AIRFLOW_REPO`` | ``apache/airflow`` | the repository from which PIP | +| | | dependencies are installed (CI | +| | | optimised) | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``AIRFLOW_BRANCH`` | ``master`` | the branch from which PIP dependencies | +| | | are installed (CI optimised) | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``AIRFLOW_CI_BUILD_EPOCH`` | ``1`` | increasing this value will reinstall PIP | +| | | dependencies from the repository from | +| | | scratch | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``AIRFLOW_EXTRAS`` | ``all`` | extras to install | ++------------------------------------------+------------------------------------------+------------------------------------------+ +| ``ADDITIONAL_PYTHON_DEPS`` | \```\` | additional python dependencies to | +| | | install | ++------------------------------------------+------------------------------------------+------------------------------------------+ + +Here are some examples of how CI images can built manually. CI is always built from local sources. + +This builds the CI image in version 3.7 with default extras ("all"). + +.. code-block:: + + docker build . -f Dockerfile.ci --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \ + --build-arg PYTHON_MAJOR_MINOR_VERSION=3.7 + + +This builds the CI image in version 3.6 with "gcp" extra only. + +.. code-block:: + + docker build . -f Dockerfile.ci --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \ Review comment: Solved/. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services