potiuk commented on a change in pull request #7832: [WIP] Add production image 
support
URL: https://github.com/apache/airflow/pull/7832#discussion_r396946424
 
 

 ##########
 File path: Dockerfile
 ##########
 @@ -0,0 +1,325 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# THIS DOCKERFILE IS INTENDED FOR PRODUCTION USE AND DEPLOYMENT.
+# NOTE! IT IS ALFA-QUALITY FOR NOW - WE ARE IN A PROCESS OF TESTING IT
+#
+#
+# This is a multi-segmented image. It actually contains two images:
+#
+# airflow-build-image  - there all airflow dependencies can be installed (and
+#                        built - for those dependencies that require
+#                        build essentials). Airflow is installed there with
+#                        --user switch so that all the dependencies are
+#                        installed to ${HOME}/.local
+#
+# main                 - this is the actual production image that is much
+#                        smaller because it does not contain all the build
+#                        essentials. Instead the ${HOME}/.local folder
+#                        is copied from the build-image - this way we have
+#                        only result of installation and we do not need
+#                        all the build essentials. This makes the image
+#                        nuch smaller.
+#
+ARG PYTHON_BASE_IMAGE="python:3.6-slim-buster"
+
+ARG AIRFLOW_VERSION="2.0.0.dev0"
+ARG AIRFLOW_ORG="apache"
+ARG AIRFLOW_REPO="airflow"
+ARG AIRFLOW_GIT_REFERENCE="master"
+ARG 
AIRFLOW_EXTRAS="async,azure_blob_storage,azure_cosmos,azure_container_instances,celery,crypto,elasticsearch,gcp,kubernetes,mysql,postgres,s3,emr,redis,slack,ssh,statsd,virtualenv"
+
+ARG AIRFLOW_HOME=/opt/airflow
+ARG AIRFLOW_USER="airflow"
+ARG AIRFLOW_GROUP="airflow"
+ARG AIRFLOW_UID="50000"
+ARG AIRFLOW_GID="50000"
+
+ARG PIP_VERSION="19.0.2"
+ARG CASS_DRIVER_BUILD_CONCURRENCY="8"
+
+##############################################################################################
+# This is the build image where we build all dependencies
+##############################################################################################
+FROM ${PYTHON_BASE_IMAGE} as airflow-build-image
+SHELL ["/bin/bash", "-o", "pipefail", "-e", "-u", "-x", "-c"]
+
+LABEL org.apache.airflow.docker=true
+LABEL org.apache.airflow.distro="debian"
+LABEL org.apache.airflow.distro.version="buster"
+LABEL org.apache.airflow.module="airflow"
+LABEL org.apache.airflow.component="airflow"
+LABEL org.apache.airflow.image="airflow-build-image"
+LABEL org.apache.airflow.uid="${AIRFLOW_UID}"
+
+ARG AIRFLOW_VERSION
+ARG AIRFLOW_ORG
+ARG AIRFLOW_REPO
+ARG AIRFLOW_GIT_REFERENCE
+ARG AIRFLOW_EXTRAS
+
+ARG PIP_VERSION
+ARG CASS_DRIVER_BUILD_CONCURRENCY
+
+ENV PYTHON_BASE_IMAGE=${PYTHON_BASE_IMAGE}
+ENV AIRFLOW_VERSION=${AIRFLOW_VERSION}
+ENV AIRFLOW_ORG=${AIRFLOW_ORG}
+ENV AIRFLOW_REPO=${AIRFLOW_REPO}
+ENV AIRFLOW_GIT_REFERENCE=${AIRFLOW_GIT_REFERENCE}
+
+ENV AIRFLOW_EXTRAS=${AIRFLOW_EXTRAS}
+
+ENV AIRFLOW_REPO_URL="https://github.com/${AIRFLOW_ORG}/${AIRFLOW_REPO}";
+ENV 
AIRFLOW_RAW_CONTENT_URL="https://raw.githubusercontent.com/${AIRFLOW_ORG}/${AIRFLOW_REPO}";
+
+ENV PIP_VERSION=${PIP_VERSION}
+ENV CASS_DRIVER_BUILD_CONCURRENCY=${CASS_DRIVER_BUILD_CONCURRENCY}
+
+# Print versions
+RUN echo "Building airflow-build-image stage" \
+    echo "Base image: ${PYTHON_BASE_IMAGE}"; \
+    echo "Airflow version: ${AIRFLOW_VERSION}"; \
+    echo "Airflow git reference: ${AIRFLOW_GIT_REFERENCE}"; \
+    echo "Airflow org: ${AIRFLOW_ORG}"; \
+    echo "Airflow repo: ${AIRFLOW_REPO}"; \
+    echo "Airflow repo url: ${AIRFLOW_REPO_URL}"; \
+    echo "Airflow extras: ${AIRFLOW_EXTRAS}" ;\
+    echo "PIP version: ${PIP_VERSION}" ;\
+    echo "Cassandra concurrency: ${CASS_DRIVER_BUILD_CONCURRENCY}" ;\
+    echo
+
+# Make sure noninteractive debian install is used and language variables set
+ENV DEBIAN_FRONTEND=noninteractive LANGUAGE=C.UTF-8 LANG=C.UTF-8 
LC_ALL=C.UTF-8 \
+    LC_CTYPE=C.UTF-8 LC_MESSAGES=C.UTF-8
+
+# Note missing man directories on debian-buster
+# https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=863199
+# Install basic apt dependencies
+RUN mkdir -pv /usr/share/man/man1 \
+    && mkdir -pv /usr/share/man/man7 \
+    && apt-get update \
+    && apt-get install -y --no-install-recommends \
+           apt-transport-https \
+           apt-utils \
+           build-essential \
+           ca-certificates \
+           curl \
+           gnupg \
+           dirmngr \
+           freetds-bin \
+           freetds-dev \
+           gosu \
+           krb5-user \
+           ldap-utils \
+           libffi-dev \
+           libkrb5-dev \
+           libpq-dev \
+           libsasl2-2 \
+           libsasl2-dev \
+           libsasl2-modules \
+           libssl-dev \
+           locales  \
+           lsb-release \
+           openssh-client \
+           postgresql-client \
+           python-selinux \
+           sasl2-bin \
+           software-properties-common \
+           sqlite3 \
+           sudo \
+           unixodbc \
+           unixodbc-dev \
+    && apt-get autoremove -yqq --purge \
+    && apt-get clean \
+    && rm -rf /var/lib/apt/lists/*
+
+# Install MySQL client from Oracle repositories (Debian installs mariadb)
+RUN KEY="A4A9406876FCBD3C456770C88C718D3B5072E1F5" \
+    && GNUPGHOME="$(mktemp -d)" \
+    && export GNUPGHOME \
+    && for KEYSERVER in $(shuf -e \
+            ha.pool.sks-keyservers.net \
+            hkp://p80.pool.sks-keyservers.net:80 \
+            keyserver.ubuntu.com \
+            hkp://keyserver.ubuntu.com:80 \
+            pgp.mit.edu) ; do \
+          gpg --keyserver "${KEYSERVER}" --recv-keys "${KEY}" && break || true 
; \
+       done \
+    && gpg --export "${KEY}" | apt-key add - \
+    && gpgconf --kill all \
+    rm -rf "${GNUPGHOME}"; \
+    apt-key list > /dev/null \
+    && echo "deb http://repo.mysql.com/apt/debian/ stretch mysql-5.7" | tee -a 
/etc/apt/sources.list.d/mysql.list \
+    && apt-get update \
+    && apt-get install --no-install-recommends -y \
+        libmysqlclient-dev \
+        mysql-client \
+    && apt-get autoremove -yqq --purge \
+    && apt-get clean && rm -rf /var/lib/apt/lists/*
+
+# disable bytecode generation
+ENV PYTHONDONTWRITEBYTECODE=1
+
+RUN pip install --upgrade pip==${PIP_VERSION}
+
+RUN pip install --user \
+    
"${AIRFLOW_REPO_URL}/archive/${AIRFLOW_GIT_REFERENCE}.tar.gz#egg=apache-airflow[${AIRFLOW_EXTRAS}]"
 \
+    --constraint  
"${AIRFLOW_RAW_CONTENT_URL}/${AIRFLOW_GIT_REFERENCE}/requirements.txt"
+
+##############################################################################################
+# This is the actual Airflow image - much smaller than the build one. We copy
+# installed Airflow and all it's dependencies from the build image to make it 
smaller.
+##############################################################################################
+FROM ${PYTHON_BASE_IMAGE} as main
+SHELL ["/bin/bash", "-o", "pipefail", "-e", "-u", "-x", "-c"]
+
+LABEL org.apache.airflow.docker=true
+LABEL org.apache.airflow.distro="debian"
+LABEL org.apache.airflow.distro.version="buster"
+LABEL org.apache.airflow.module="airflow"
+LABEL org.apache.airflow.component="airflow"
+LABEL org.apache.airflow.image="airflow"
+LABEL org.apache.airflow.uid="${AIRFLOW_UID}"
+
+ARG AIRFLOW_VERSION
+
+ARG AIRFLOW_HOME
+ARG AIRFLOW_USER
+ARG AIRFLOW_GROUP
+ARG AIRFLOW_UID
+ARG AIRFLOW_GID
+
+ARG PIP_VERSION
+ARG CASS_DRIVER_BUILD_CONCURRENCY
+
+ENV PYTHON_BASE_IMAGE=${PYTHON_BASE_IMAGE}
+ENV AIRFLOW_VERSION=${AIRFLOW_VERSION}
+
+ENV AIRFLOW_HOME=${AIRFLOW_HOME}
+ENV AIRFLOW_USER=${AIRFLOW_USER}
+ENV AIRFLOW_GROUP=${AIRFLOW_GROUP}
+ENV AIRFLOW_UID=${AIRFLOW_UID}
+ENV AIRFLOW_GID=${AIRFLOW_GID}
+
+ENV PIP_VERSION=${PIP_VERSION}
+
+# Print versions
+RUN echo "Building main airflow image"; \
+    echo "Base image: ${PYTHON_BASE_IMAGE}"; \
+    echo "Airflow version: ${AIRFLOW_VERSION}"; \
+    echo "Airflow home: ${AIRFLOW_HOME}"; \
+    echo "Airflow user: ${AIRFLOW_USER}"; \
+    echo "Airflow uid: ${AIRFLOW_UID}" ;\
+    echo "PIP version: ${PIP_VERSION}" ;\
+    echo
+
+# Make sure noninteractive debian install is used and language variables set
+ENV DEBIAN_FRONTEND=noninteractive LANGUAGE=C.UTF-8 LANG=C.UTF-8 
LC_ALL=C.UTF-8 \
+    LC_CTYPE=C.UTF-8 LC_MESSAGES=C.UTF-8
+
+# Note missing man directories on debian-buster
+# https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=863199
+# Install basic apt dependencies
+RUN mkdir -pv /usr/share/man/man1 \
+    && mkdir -pv /usr/share/man/man7 \
+    && apt-get update \
+    && apt-get install -y --no-install-recommends \
+           apt-transport-https \
+           apt-utils \
+           ca-certificates \
+           curl \
+           dumb-init \
+           freetds-bin \
+           freetds-dev \
+           gnupg \
+           gosu \
+           krb5-user \
+           ldap-utils \
+           libffi-dev \
+           libkrb5-dev \
+           libpq-dev \
+           libsasl2-2 \
+           libsasl2-dev \
+           libsasl2-modules \
+           libssl-dev \
+           locales  \
+           lsb-release \
+           netcat \
+           openssh-client \
+           postgresql-client \
+           python-selinux \
+           sasl2-bin \
+           software-properties-common \
+           sqlite3 \
+           sudo \
+           unixodbc \
+           unixodbc-dev \
+    && apt-get autoremove -yqq --purge \
+    && apt-get clean \
+    && rm -rf /var/lib/apt/lists/*
+
+# Install MySQL client from Oracle repositories (Debian installs mariadb)
+RUN KEY="A4A9406876FCBD3C456770C88C718D3B5072E1F5" \
+    && GNUPGHOME="$(mktemp -d)" \
 
 Review comment:
   Yes. See the comment above just before the second "FROM". 
   
   We have two images (multi-segment):
   
   * The "build" image is there to run "pip install" - it needs MySQL to 
install the MySQL dev library and pip dependencies. But we only use that image 
to run the "pip install --user" and store the dependencies and libraries 
compiled during install in "${HOME}/.local"
   
   * The actual Airflow image: this one has no "build" dependencies 
(build-essentials) and during optimization we can remove few more unneeded 
libraries. Then from the "build" image we ONLY take the ".local" directory 
(this contains all python pip-insttaled dependencies and all binary-compiled 
.so libraries needed. This way the resulting Airflow image is much smaller. 
   
   We need the MySQL libraries in Airflow image because ... we want to connect 
to MySQL :).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to