potiuk closed pull request #3976: [WIP] Integration tests
URL: https://github.com/apache/incubator-airflow/pull/3976
This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:
As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):
diff --git a/.gcloudignore b/.gcloudignore
new file mode 100644
index 0000000000..481bd2698a
--- /dev/null
+++ b/.gcloudignore
@@ -0,0 +1,17 @@
+# This file specifies files that are *not* uploaded to Google Cloud Platform
+# using gcloud. It follows the same syntax as .gitignore, with the addition of
+# "#!include" directives (which insert the entries of the given
.gitignore-style
+# file at that point).
+#
+# For more information, run:
+# $ gcloud topic gcloudignore
+#
+.gcloudignore
+# If you would like to upload your .git directory, .gitignore file or files
+# from your .gitignore file, remove the corresponding line
+# below:
+.git
+.gitignore
+
+node_modules
+#!include:.gitignore
diff --git a/.travis.yml b/.travis.yml
index dd493363ab..641d1df5eb 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -25,16 +25,16 @@ env:
- SLUGIFY_USES_TEXT_UNIDECODE=yes
- TRAVIS_CACHE=$HOME/.travis_cache/
matrix:
- - TOX_ENV=flake8
- - TOX_ENV=py27-backend_mysql-env_docker
- - TOX_ENV=py27-backend_sqlite-env_docker
- - TOX_ENV=py27-backend_postgres-env_docker
- - TOX_ENV=py35-backend_mysql-env_docker PYTHON_VERSION=3
- - TOX_ENV=py35-backend_sqlite-env_docker PYTHON_VERSION=3
- - TOX_ENV=py35-backend_postgres-env_docker PYTHON_VERSION=3
- - TOX_ENV=py27-backend_postgres-env_kubernetes KUBERNETES_VERSION=v1.9.0
- - TOX_ENV=py35-backend_postgres-env_kubernetes KUBERNETES_VERSION=v1.10.0
PYTHON_VERSION=3
-
+ - TOX_ENV=integration
+# - TOX_ENV=flake8
+# - TOX_ENV=py27-backend_mysql-env_docker
+# - TOX_ENV=py27-backend_sqlite-env_docker
+# - TOX_ENV=py27-backend_postgres-env_docker
+# - TOX_ENV=py35-backend_mysql-env_docker PYTHON_VERSION=3
+# - TOX_ENV=py35-backend_sqlite-env_docker PYTHON_VERSION=3
+# - TOX_ENV=py35-backend_postgres-env_docker PYTHON_VERSION=3
+# - TOX_ENV=py27-backend_postgres-env_kubernetes KUBERNETES_VERSION=v1.9.0
+# - TOX_ENV=py35-backend_postgres-env_kubernetes KUBERNETES_VERSION=v1.10.0
PYTHON_VERSION=3
cache:
directories:
- $HOME/.wheelhouse/
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index f114c66585..a41929c1ed 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -15,6 +15,7 @@ little bit helps, and credit will always be given.
* [Development and Testing](#development-and-testing)
- [Setting up a development
environment](#setting-up-a-development-environment)
- [Running unit tests](#running-unit-tests)
+ - [Running integration tests](#running-integration-tests)
* [Pull requests guidelines](#pull-request-guidelines)
* [Changing the Metadata Database](#changing-the-metadata-database)
@@ -193,6 +194,32 @@ See also the list of test classes and methods in
`tests/core.py`.
Feel free to customize based on the extras available in [setup.py](./setup.py)
+### Running integration tests
+
+To run DAGs as integration tests locally directly from the CLI, once your
development
+ environment is setup
+(directly on your system or through a Docker setup) you can simply run
`./run_int_tests.sh`.
+
+It accepts 2 parameters:
+
+`-v / --vars`: a comma-separated list of key-value pairs
(`[KEY1=VALUE1,KEY2=VALUE2,..
+.]`) -
+these are Airflow variables, through which you can inject the necessary config
values
+to the tested DAGs
+
+`-d / --dags`: a path expression specifying which DAGs to run, e.g.
+`$AIRFLOW_HOME/incubator-airflow/airflow/contrib/example_dags/example_gcf*`
will run all
+DAGs from `example_dags` with names beginning with `example_gcf`.
+
+Full example running tests for Google Cloud Functions operators:
+```
+./run_int_tests.sh --vars=[PROJECT_ID=<gcp_project_id>,LOCATION=<gcp_region>,
+SOURCE_REPOSITORY="https://source.developers.google
+.com/projects/<gcp_project_id>/repos/<your_repo>/moveable-aliases/master",
+ENTRYPOINT=helloWorld]
--dags=$AIRFLOW_HOME/incubator-airflow/airflow/contrib/example_dags/example_gcf*
+```
+
+
## Pull Request Guidelines
Before you submit a pull request from your forked repo, check that it
diff --git a/run_int_tests.sh b/run_int_tests.sh
new file mode 100755
index 0000000000..758274be23
--- /dev/null
+++ b/run_int_tests.sh
@@ -0,0 +1,139 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+for i in "$@"
+do
+case ${i} in
+ -v=*|--vars=*)
+ INT_TEST_VARS="${i#*=}"
+ shift # past argument=value
+ ;;
+ -d=*|--dags=*)
+ INT_TEST_DAGS="${i#*=}"
+ shift # past argument=value
+ ;;
+ *)
+ # unknown option
+ ;;
+esac
+done
+echo "VARIABLES = ${INT_TEST_VARS}"
+echo "DAGS = ${INT_TEST_DAGS}"
+if [[ -n $1 ]]; then
+ echo "Last line of file specified as non-opt/last argument:"
+ tail -1 $1
+fi
+
+# Remove square brackets if they exist
+TEMP=${INT_TEST_VARS//[}
+SANITIZED_VARIABLES=${TEMP//]}
+echo ""
+echo "========= AIRFLOW VARIABLES =========="
+echo ${SANITIZED_VARIABLES}
+echo ""
+
+IFS=',' read -ra ENVS <<< "${SANITIZED_VARIABLES}"
+for item in "${ENVS[@]}"; do
+ IFS='=' read -ra ENV <<< "$item"
+ airflow variables -s "${ENV[0]}" "${ENV[1]}"
+ echo "Set Airflow variable:"" ${ENV[0]}"" ${ENV[1]}"
+done
+
+AIRFLOW_HOME=${AIRFLOW_HOME:-/home/airflow}
+INT_TEST_DAGS=${INT_TEST_DAGS:-${AIRFLOW_HOME}/incubator-airflow/airflow/contrib/example_dags/*.py}
+INT_TEST_VARS=${INT_TEST_VARS:-"[PROJECT_ID=project,LOCATION=europe-west1,SOURCE_REPOSITORY=https://example.com,ENTRYPOINT=helloWorld]"}
+
+echo "Running test DAGs from: ${INT_TEST_DAGS}"
+rm -vf ${AIRFLOW_HOME}/dags/*
+cp -v ${INT_TEST_DAGS} ${AIRFLOW_HOME}/dags/
+
+airflow initdb
+tmux new-session -d -s webserver 'airflow webserver'
+sleep 2
+tmux new-session -d -s scheduler 'airflow scheduler'
+sleep 2
+
+function get_dag_state() {
+ tmp=$(airflow dag_state $1 $(date -d "1 day ago" '+%m-%dT00:00:00+00:00'))
+ result=$(echo "$tmp" | tail -1)
+ echo ${result}
+}
+
+results=()
+while read -r name ; do
+ echo "Unpausing $name"
+ airflow unpause ${name}
+ while [ "$(get_dag_state ${name})" = "running" ]
+ do
+ echo "Sleeping 1s..."
+ sleep 1
+ continue
+ done
+ res=$(get_dag_state ${name})
+ if ! [[ ${res} = "success" ]]; then
+ res="failed"
+ fi
+ echo ">>> FINISHED $name: "${res}
+ results+=("$name:"${res})
+done < <(ls /home/airflow/dags | grep '.*py$' | grep -Po '.*(?=\.)')
+# `ls ...` -> Get all .py files and remove the file extension from the names
+# ^ Process substitution to avoid the sub-shell and interact with array
outside of the loop
+# https://unix.stackexchange.com/a/407794/78408
+
+echo ""
+echo "===== RESULTS: ====="
+for item in "${results[@]}"
+do
+ echo ${item}
+done
+echo ""
+
+for item in "${results[@]}"
+do
+ IFS=':' read -ra NAMES <<< "$item"
+ if [[ ${NAMES[1]} = "failed" ]]; then
+ dir_name="${NAMES[0]}"
+ for entry in "${AIRFLOW_HOME}/logs/${dir_name}/*"
+ do
+ echo ""
+ echo ""
+ echo "===== ERROR LOG [START]: ${dir_name} ===== "
+ echo ${entry}
+ echo ""
+ tail -n 50 ${entry}/$(date -d "1 day ago"
'+%Y-%m-%dT00:00:00+00:00')/1.log
+ echo ""
+ echo "===== ERROR LOG [END]: ${dir_name} ===== "
+ echo ""
+ done
+ fi
+done
+
+echo "NUMBER OF TESTS RUN: ${#results[@]}"
+
+for item in "${results[@]}"
+do
+ if ! [[ ${item} = *"success"* ]]; then
+ echo "STATUS: TESTS FAILED"
+ exit 1
+ fi
+done
+
+echo "STATUS: ALL TESTS SUCCEEDED"
+exit 0
diff --git a/scripts/ci/5a-run-int-tests.sh b/scripts/ci/5a-run-int-tests.sh
new file mode 100755
index 0000000000..21f32e167d
--- /dev/null
+++ b/scripts/ci/5a-run-int-tests.sh
@@ -0,0 +1,205 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+set -o verbose
+
+pwd
+
+echo "Using travis airflow.cfg"
+DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
+cp -f ${DIR}/airflow_travis.cfg ~/unittests.cfg
+
+ROOTDIR="$(dirname $(dirname ${DIR}))"
+export AIRFLOW__CORE__DAGS_FOLDER="$ROOTDIR/tests/dags"
+
+# add test/contrib to PYTHONPATH
+export PYTHONPATH=${PYTHONPATH:-$ROOTDIR/tests/test_utils}
+
+# environment
+export AIRFLOW_HOME=${AIRFLOW_HOME:=${HOME}}
+export AIRFLOW__CORE__UNIT_TEST_MODE=True
+
+# configuration test
+export AIRFLOW__TESTSECTION__TESTKEY=testvalue
+
+# any argument received is overriding the default nose execution arguments:
+nose_args=$@
+
+# Generate the `airflow` executable if needed
+which airflow > /dev/null || python setup.py develop
+
+# For impersonation tests on Travis, make airflow accessible to other users
via the global PATH
+# (which contains /usr/local/bin)
+sudo ln -sf "${VIRTUAL_ENV}/bin/airflow" /usr/local/bin/
+
+# kdc init happens in setup_kdc.sh
+kinit -kt ${KRB5_KTNAME} airflow
+
+# For impersonation tests running on SQLite on Travis, make the database world
readable so other
+# users can update it
+AIRFLOW_DB="$HOME/airflow.db"
+
+if [ -f "${AIRFLOW_DB}" ]; then
+ chmod a+rw "${AIRFLOW_DB}"
+ chmod g+rwx "${AIRFLOW_HOME}"
+fi
+
+if [ ! -z "${GCLOUD_SERVICE_KEY}" ]; then
+
+ echo "Installing lsb_release"
+ echo
+ sudo apt-get update && sudo apt-get install -y --no-install-recommends
lsb-core
+ echo
+ echo "Starting the integration tests"
+ echo ${GCLOUD_SERVICE_KEY} | base64 --decode > /tmp/key.json
+ KEY_DIR=/tmp
+ GCP_SERVICE_ACCOUNT_KEY_NAME=key.json
+
+ echo
+ echo "Installing gcloud CLI"
+ echo
+ export CLOUD_SDK_REPO="cloud-sdk-$(lsb_release -c -s)"
+ echo "deb http://packages.cloud.google.com/apt $CLOUD_SDK_REPO main" |
sudo tee -a /etc/apt/sources.list.d/google-cloud-sdk.list
+ curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key
add -
+ sudo apt-get update && sudo apt-get install -y --no-install-recommends
google-cloud-sdk
+ echo
+ echo "Activating service account with
${KEY_DIR}/${GCP_SERVICE_ACCOUNT_KEY_NAME}"
+ echo
+ export
GOOGLE_APPLICATION_CREDENTIALS=${KEY_DIR}/${GCP_SERVICE_ACCOUNT_KEY_NAME}
+ sudo gcloud auth activate-service-account \
+ --key-file="${KEY_DIR}/${GCP_SERVICE_ACCOUNT_KEY_NAME}"
+ ACCOUNT=$(cat "${KEY_DIR}/${GCP_SERVICE_ACCOUNT_KEY_NAME}" | \
+ python -c 'import json, sys; info=json.load(sys.stdin); print
info["client_email"]')
+ PROJECT=$(cat "${KEY_DIR}/${GCP_SERVICE_ACCOUNT_KEY_NAME}" | \
+ python -c 'import json, sys; info=json.load(sys.stdin); print
info["project_id"]')
+ sudo gcloud config set account "${ACCOUNT}"
+ sudo gcloud config set project "${PROJECT}"
+ echo "Initializing the DB"
+ yes | sudo airflow initdb
+ yes | sudo airflow resetdb
+ python ${DIR}/_setup_gcp_connection.py "${PROJECT}"
+ echo
+ echo "Service account activated"
+ echo
+ rm -vf /tmp/key.json
+else
+ echo "Skipping integration tests as no GCLOUD_SERVICE_KEY defined"
+ exit 0
+fi
+
+AIRFLOW_HOME=${AIRFLOW_HOME:-/home/airflow}
+INT_TEST_DAGS=${INT_TEST_DAGS:-${AIRFLOW_HOME}/incubator-airflow/airflow/contrib/example_dags/*.py}
+INT_TEST_VARS=${INT_TEST_VARS:-"[PROJECT_ID=project,LOCATION=europe-west1,SOURCE_REPOSITORY=https://example.com,ENTRYPOINT=helloWorld]"}
+
+echo "AIRFLOW_HOME = ${AIRFLOW_HOME}"
+echo "VARIABLES = ${INT_TEST_VARS}"
+echo "DAGS = ${INT_TEST_DAGS}"
+
+# Remove square brackets if they exist
+TEMP=${INT_TEST_VARS//[}
+SANITIZED_VARIABLES=${TEMP//]}
+echo ""
+echo "========= AIRFLOW VARIABLES =========="
+echo ${SANITIZED_VARIABLES}
+echo ""
+
+IFS=',' read -ra ENVS <<< "${SANITIZED_VARIABLES}"
+for item in "${ENVS[@]}"; do
+ IFS='=' read -ra ENV <<< "$item"
+ sudo airflow variables -s "${ENV[0]}" "${ENV[1]}"
+ echo "Set Airflow variable:"" ${ENV[0]}"" ${ENV[1]}"
+done
+
+echo "Running test DAGs from: ${INT_TEST_DAGS}"
+rm -fv ${AIRFLOW_HOME}/dags/*
+cp -v ${INT_TEST_DAGS} ${AIRFLOW_HOME}/dags/
+
+nohup sudo airflow webserver &
+sleep 2
+nohup sudo airflow scheduler &
+sleep 2
+
+function get_dag_state() {
+ tmp=$(sudo airflow dag_state $1 $(date -d "1 day ago"
'+%m-%dT00:00:00+00:00'))
+ result=$(echo "$tmp" | tail -1)
+ echo ${result}
+}
+
+results=()
+while read -r name ; do
+ echo "Unpausing $name"
+ sudo airflow unpause ${name}
+ while [ "$(get_dag_state ${name})" = "running" ]
+ do
+ echo "Sleeping 1s..."
+ sleep 1
+ continue
+ done
+ res=$(get_dag_state ${name})
+ if ! [[ ${res} = "success" ]]; then
+ res="failed"
+ fi
+ echo ">>> FINISHED $name: "${res}
+ results+=("$name:"${res})
+done < <(ls /home/airflow/dags | grep '.*py$' | grep -Po '.*(?=\.)')
+# `ls ...` -> Get all .py files and remove the file extension from the names
+# ^ Process substitution to avoid the sub-shell and interact with array
outside of the loop
+# https://unix.stackexchange.com/a/407794/78408
+
+echo ""
+echo "===== RESULTS: ====="
+for item in "${results[@]}"
+do
+ echo ${item}
+done
+echo ""
+
+for item in "${results[@]}"
+do
+ IFS=':' read -ra NAMES <<< "$item"
+ if [[ ${NAMES[1]} = "failed" ]]; then
+ dir_name="${NAMES[0]}"
+ for entry in "${AIRFLOW_HOME}/logs/${dir_name}/*"
+ do
+ echo ""
+ echo ""
+ echo "===== ERROR LOG [START]: ${dir_name} ===== "
+ echo ${entry}
+ echo ""
+ tail -n 50 ${entry}/$(date -d "1 day ago"
'+%Y-%m-%dT00:00:00+00:00')/1.log
+ echo ""
+ echo "===== ERROR LOG [END]: ${dir_name} ===== "
+ echo ""
+ done
+ fi
+done
+
+echo "NUMBER OF TESTS RUN: ${#results[@]}"
+
+for item in "${results[@]}"
+do
+ if ! [[ ${item} = *"success"* ]]; then
+ echo "STATUS: TESTS FAILED"
+ exit 1
+ fi
+done
+
+echo "STATUS: ALL TESTS SUCCEEDED"
+exit 0
diff --git a/scripts/ci/_setup_gcp_connection.py
b/scripts/ci/_setup_gcp_connection.py
new file mode 100644
index 0000000000..ced7e01df0
--- /dev/null
+++ b/scripts/ci/_setup_gcp_connection.py
@@ -0,0 +1,49 @@
+# Copyright 2018 Google LLC
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# https://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+"""Writes GCP Connection to the airflow db."""
+import json
+import os
+import sys
+
+from airflow import models
+from airflow import settings
+
+KEYPATH_EXTRA = 'extra__google_cloud_platform__key_path'
+SCOPE_EXTRA = 'extra__google_cloud_platform__scope'
+PROJECT_EXTRA = 'extra__google_cloud_platform__project'
+
+full_key_path = '/tmp/key.json'
+if not os.path.isfile(full_key_path):
+ print
+ print 'The key file ' + full_key_path + ' is missing!'
+ print
+ sys.exit(1)
+
+session = settings.Session()
+try:
+ conn = session.query(models.Connection).filter(
+ models.Connection.conn_id == 'google_cloud_default')[0]
+ extras = conn.extra_dejson
+ extras[KEYPATH_EXTRA] = full_key_path
+ print 'Setting GCP key file to ' + full_key_path
+ extras[SCOPE_EXTRA] = 'https://www.googleapis.com/auth/cloud-platform'
+ extras[PROJECT_EXTRA] = sys.argv[1]
+ conn.extra = json.dumps(extras)
+ session.commit()
+except BaseException as e:
+ print 'session error' + str(e.message)
+ session.rollback()
+ raise
+finally:
+ session.close()
diff --git a/scripts/ci/docker-compose.yml b/scripts/ci/docker-compose.yml
index 101ad95297..83f2a32f94 100644
--- a/scripts/ci/docker-compose.yml
+++ b/scripts/ci/docker-compose.yml
@@ -81,6 +81,9 @@ services:
- TRAVIS_REPO_SLUG
- TRAVIS_OS_NAME
- TRAVIS_TAG
+ - INT_TEST_DAGS
+ - INT_TEST_VARS
+ - GCLOUD_SERVICE_KEY
depends_on:
- postgres
- mysql
diff --git a/tox.ini b/tox.ini
index f07641f29b..32fe3f7ce4 100644
--- a/tox.ini
+++ b/tox.ini
@@ -17,7 +17,7 @@
# under the License.
[tox]
-envlist =
flake8,{py27,py35}-backend_{mysql,sqlite,postgres}-env_{docker,kubernetes}
+envlist =
flake8,integration,{py27,py35}-backend_{mysql,sqlite,postgres}-env_{docker,kubernetes}
skipsdist = True
[global]
@@ -72,3 +72,18 @@ deps =
commands =
{toxinidir}/scripts/ci/flake8-diff.sh
+
+
+[testenv:integration]
+basepython = python2.7
+
+deps =
+ wheel
+
+commands =
+ pip wheel --progress-bar off -w {homedir}/.wheelhouse -f
{homedir}/.wheelhouse -e .[devel_ci]
+ pip install --progress-bar off --find-links={homedir}/.wheelhouse
--no-index -e .[devel_ci]
+ {toxinidir}/scripts/ci/1-setup-env.sh
+ {toxinidir}/scripts/ci/2-setup-kdc.sh
+ {toxinidir}/scripts/ci/3-setup-databases.sh
+ {toxinidir}/scripts/ci/5a-run-int-tests.sh []
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services