potiuk commented on a change in pull request #4937: [AIRFLOW-4116]
Multi-staging includes CI image [Step 2/3]
URL: https://github.com/apache/airflow/pull/4937#discussion_r268895206
##########
File path: Dockerfile
##########
@@ -85,14 +85,134 @@ RUN adduser airflow \
&& echo "airflow ALL=(ALL) NOPASSWD: ALL" > /etc/sudoers.d/airflow \
&& chmod 0440 /etc/sudoers.d/airflow
+############################################################################################################
+# This is an image with all APT dependencies needed by CI. It is built on top
of the airlfow APT image
+# Parameters:
+# airflow-apt-deps - this is the base image for CI deps image.
+############################################################################################################
+FROM airflow-apt-deps as airflow-ci-apt-deps
+
+SHELL ["/bin/bash", "-o", "pipefail", "-e", "-u", "-x", "-c"]
+
+ENV JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/
+
+ARG APT_DEPS_IMAGE
+ENV APT_DEPS_IMAGE=${APT_DEPS_IMAGE}
+
+RUN echo "${APT_DEPS_IMAGE}"
+
+# Note the ifs below might be removed if Buildkit will become usable. It
should skip building this
+# image automatically if it is not used. For now we still go through all
layers below but they are empty
+# Note missing directories on debian-stretch
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=863199
+RUN if [[ "${APT_DEPS_IMAGE}" == "airflow-ci-apt-deps" ]]; then \
+ mkdir -pv /usr/share/man/man1 \
+ && mkdir -pv /usr/share/man/man7 \
+ && apt-get update \
+ && apt-get install --no-install-recommends -y \
+ lsb-release gnupg dirmngr openjdk-8-jdk \
+ vim tmux less unzip net-tools netcat \
+ ldap-utils postgresql-client sqlite3 \
+ krb5-user openssh-client openssh-server \
+ python-selinux \
+ && apt-get autoremove -yqq --purge \
+ && apt-get clean \
+ && rm -rf /var/lib/apt/lists/* \
+ ;\
+ fi
+
+RUN if [[ "${APT_DEPS_IMAGE}" == "airflow-ci-apt-deps" ]]; then \
+ KEY="A4A9406876FCBD3C456770C88C718D3B5072E1F5" \
+ && GNUPGHOME="$(mktemp -d)" \
+ && export GNUPGHOME \
+ && for KEYSERVER in $(shuf -e \
+ ha.pool.sks-keyservers.net \
+ hkp://p80.pool.sks-keyservers.net:80 \
+ keyserver.ubuntu.com \
+ hkp://keyserver.ubuntu.com:80 \
+ pgp.mit.edu) ; do \
+ gpg --keyserver "${KEYSERVER}" --recv-keys "${KEY}" && break ||
true ; \
+ done \
+ && gpg --export "${KEY}" > /etc/apt/trusted.gpg.d/mysql.gpg \
+ && gpgconf --kill all \
+ rm -rf "${GNUPGHOME}"; \
+ apt-key list > /dev/null \
+ && echo "deb http://repo.mysql.com/apt/ubuntu/ trusty mysql-5.7" | \
+ tee -a /etc/apt/sources.list.d/mysql.list \
+ && apt-get update \
+ && MYSQL_PASS="secret" \
+ && debconf-set-selections <<< \
+ "mysql-community-server mysql-community-server/data-dir select ''"
\
+ && debconf-set-selections <<< \
+ "mysql-community-server mysql-community-server/root-pass password
${MYSQL_PASS}" \
+ && debconf-set-selections <<< \
+ "mysql-community-server mysql-community-server/re-root-pass
password ${MYSQL_PASS}" \
Review comment:
I am not 100% sure now, but I believe I struggled with similar problems when
implementing CloudSQL operators (with cloudsqlproxy). I spent quite some time
trying to find the best way to install mysql client on debian-stretch so it
might be the result of my struggles and trial/error then and when I clean it
up, it might be not needed.
I might find other, simpler solution - like stop using root user for mysql
db at all or enforcing TCP protocol.
Sorry for being so verbose but I am venting my frustrations with mysql
connectivity issues :). But for some people it might be even interesting :).
OK. Beginning of rant on MySQL.
The problem is that the original scripts from ci_script use passwordless,
remote root connection to create airflow database and I probably want to change
that. This is the most probable reason why Travis tests fail now - some
authentication option somewhere either on client or server prevent from running
the query below. We currently got "Host mysql does not exist" but really this
is a generic error when there is any problem with authentication. Yeah -
security first - better pretend that you are not there than admit that password
is wrong.
Here is the offending query:
mysql -h ${MYSQL_HOST} -u root -e 'drop database if exists airflow;
create database airflow
The main reason is that MySQL 5.7 changed the secure model: now MySQL root
login requires a sudo. By default you cannot login to root user unless you have
sudo. And that's where the journey starts.
For mysql "root only with sudo" is checked and enforced in both client and
server and both check configuration using server configuration variables
(Yay!). The problem is that by default root user is only allowed to login
using UNIX sockets - which allows only for local connectivity and is the only
way to check for the user privileges/sudo. Imagine trying to connect vial
cloudsqlproxy which can forward UNIX socket ... but then looses sudo check as
we are on a different machine (yeah!).
There is also another trickery connected with how mysql treats localhost and
127.0.0.1 differently, and possibly how local docker hostname is treated in
docker network created by docker compose. There are multiple threads on
StackOverflow some with several hundreds upvotes (like this one:
https://askubuntu.com/questions/766334/cant-login-as-mysql-user-root-from-normal-user-account-in-ubuntu-16-04).
This one is my favourite (almost 500 upvotes)
https://stackoverflow.com/questions/7739645/install-mysql-on-ubuntu-without-a-password-prompt
. There are many ways you can set this up, it's different for different
versions of mysql. If you look at this closely - most of the suggested ways is
really catch 22 - you need to connect to a running database and run a query to
change root user configuration to be able to connect via TCP. If you want to
run it remotely on a Dockerised database - you are basically out of luck :).
MySQL connectivity for root is a big ball of mud really.
And yes - this is for server, but it requires some trickery (setting the
server variables) to change this default behaviour on the client side as well
as on server. If you don't set those parameters, client will always try to use
UNIX socket authentication for root user, no matter what you specify as host.
Setting the server parameters is the trick to make client side to allow TCP
communication for root user and get the client installed without asking for
root password (wait what? yeah, I know).
Just for curiosity - yet another interesting problem I had initially when i
used wheels. I created wheel packages in a different image (based on the same
python image) - it used mariadb (default) rather than mysql - which I manually
installed only in CI image. That was first problem and the result was rather
strange. Wheel packages (mysql db api) were compiled/linked against mariadb.so
and we had only mysql ones (and mariadb one was removed) and the connection
attempts failed of course. This last problem is now gone as it turned out that
wheel packages give marginal improvements and complicate a lot.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services