This is an automated email from the ASF dual-hosted git repository.
potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git
The following commit(s) were added to refs/heads/main by this push:
new d872f7ed93 Update integration tests docs and add "Writing Integration
Tests" guide (#39986)
d872f7ed93 is described below
commit d872f7ed932ba9b66866539071f0a990744c2e90
Author: Shahar Epstein <[email protected]>
AuthorDate: Sat Jun 1 15:57:16 2024 +0300
Update integration tests docs and add "Writing Integration Tests" guide
(#39986)
Co-authored-by: Jarek Potiuk <[email protected]>
---
contributing-docs/testing/integration_tests.rst | 139 +++++++++++++++++++++---
1 file changed, 125 insertions(+), 14 deletions(-)
diff --git a/contributing-docs/testing/integration_tests.rst
b/contributing-docs/testing/integration_tests.rst
index 2c55d1f857..80cf169d90 100644
--- a/contributing-docs/testing/integration_tests.rst
+++ b/contributing-docs/testing/integration_tests.rst
@@ -18,9 +18,11 @@
Airflow Integration Tests
=========================
-Some of the tests in Airflow are integration tests. These tests require
``airflow`` Docker
-image and extra images with integrations (such as ``celery``, ``mongodb``,
etc.).
-The integration tests are all stored in the ``tests/integration`` folder.
+Integration tests in Airflow check the interactions between Airflow components
and external services
+that could run as separate Docker containers, without connecting to an
external API on the internet.
+These tests require ``airflow`` Docker image and extra images with
integrations (such as ``celery``, ``mongodb``, etc.).
+The integration tests are all stored in the ``tests/integration`` folder, and
similarly to the unit tests they all run
+using `pytest <http://doc.pytest.org/en/latest/>`_, but they are skipped by
default unless ``--integration`` flag is passed to pytest.
**The outline for this document in GitHub is available at top-right corner
button (with 3-dots and 3 lines).**
@@ -28,18 +30,20 @@ Enabling Integrations
---------------------
Airflow integration tests cannot be run in the local virtualenv. They can only
run in the Breeze
-environment with enabled integrations and in the CI. See `CI
<../../dev/breeze/doc/ci/README.md>`_ for
+environment and in the CI, with their respective integrations enabled. See `CI
<../../dev/breeze/doc/ci/README.md>`_ for
details about Airflow CI.
-When you are in the Breeze environment, by default, all integrations are
disabled. This enables only true unit tests
-to be executed in Breeze. You can enable the integration by passing the
``--integration <INTEGRATION>``
-switch when starting Breeze. You can specify multiple integrations by
repeating the ``--integration`` switch
-or using the ``--integration all-testable`` switch that enables all testable
integrations and
-``--integration all`` switch that enables all integrations.
+When you initiate a Breeze environment, by default, all integrations are
disabled. This enables only unit tests
+to be executed in Breeze. You can enable an integration by passing the
``--integration <INTEGRATION>``
+switch when starting Breeze, either with ``breeze shell`` or with ``breeze
start-airflow``. As there's no need to simulate
+a full setup of Airflow during integration tests, using ``breeze shell`` (or
simply ``breeze``) to run them is
+sufficient. You can specify multiple integrations by repeating the
``--integration`` switch, or by using the ``--integration all-testable`` switch
+that enables all testable integrations. You may use ``--integration all``
switch to enable all integrations that
+includes also non-testable integrations such as openlineage.
NOTE: Every integration requires a separate container with the corresponding
integration image.
These containers take precious resources on your PC, mainly the memory. The
started integrations are not stopped
-until you stop the Breeze environment with the ``stop`` command and started
with the ``start`` command.
+until you stop the Breeze environment with the ``breeze down`` command.
The following integrations are available:
@@ -77,18 +81,20 @@ The following integrations are available:
.. END AUTO-GENERATED INTEGRATION LIST'
-To start the ``mongo`` integration only, enter:
+To start a shell with ``mongo`` integration enabled, enter:
.. code-block:: bash
breeze --integration mongo
-To start ``mongo`` and ``cassandra`` integrations, enter:
+You could add multiple ``--integration`` options as the types of the
integrations that you want to enable.
+For example, to start a shell with both ``mongo`` and ``cassandra``
integrations enabled, enter:
.. code-block:: bash
breeze --integration mongo --integration cassandra
+
To start all testable integrations, enter:
.. code-block:: bash
@@ -99,7 +105,7 @@ To start all integrations, enter:
.. code-block:: bash
- breeze --integration all-testable
+ breeze --integration all
Note that Kerberos is a special kind of integration. Some tests run
differently when
Kerberos integration is enabled (they retrieve and use a Kerberos
authentication token) and differently when the
@@ -109,7 +115,7 @@ for the CI system should run all tests with the Kerberos
integration enabled to
Running Integration Tests
-------------------------
-All tests using an integration are marked with a custom pytest marker
``pytest.mark.integration``.
+All integration tests are marked with a custom pytest marker
``pytest.mark.integration``.
The marker has a single parameter - the name of integration.
Example of the ``celery`` integration test:
@@ -172,6 +178,111 @@ Runs all mongo DB tests:
breeze testing integration-tests --db-reset --integration mongo
+Writing Integration Tests
+-------------------------
+Before creating the integration tests, you'd like to make the integration
itself (i.e., the service) available for use.
+For that, you'll first need to create a Docker Compose YAML file under
``scripts/ci/docker-compose``, named
+``integration-<INTEGRATION>.yml``. The file should define one service for the
integration, and another one
+for the Airflow instance that depends on it. It is recommended to stick to the
following guidelines:
+
+
+1. Name the ``services::<INTEGRATION>::container_name`` as the service's name
and give it an appropriate description under
+``services::<INTEGRATION>::labels:breeze.description``, so it would be easier
to detect it in Docker for debugging
+purposes.
+
+2. Use an official stable release of the service with a pinned version. When
there are number of possibilities for an
+image, you should probably pick the latest version that is supported by
Airflow.
+
+3. Set the ``services::<INTEGRATION>::restart`` to "on-failure".
+
+4. For integrations that require persisting data (for example, databases),
define a volume at ``volumes::<VOLUME_NAME>``
+and mount the volume to the data path on the container by listing it under
``services:<INTEGRATION>::volumes``
+(see example).
+
+5. Check what ports should be exposed to use the service - carefully validate
that these ports are not in use by other
+integrations (consult the community what to do if such case happens). To avoid
conflicts with host's ports, it is a
+good practice to prefix the corresponding host port with a number (usually 2),
parametrize it and to list the parameter
+under ``# Initialise base variables`` section in
``dev/breeze/src/airflow_breeze/global_constants.py``.
+
+6. In some cases you might need to change the entrypoint of the service's
container, for example, by setting
+``stdin_open: true``.
+
+7. In the Airflow service definition, ensure that it depends on the
integration's service (``depands_on``) and set
+the env. var. ``INTEGRATION-<INTEGRATION>`` to true.
+
+8. If you need to mount a file (for example, a configuration file), you could
put it at ``scripts/ci/docker-compose``
+(or a subfolder of this path) and list it under
``services::<INTEGRATION>::volumes``.
+
+For example, ``integration-drill.yml`` looks as follows:
+
+ .. code-block:: yaml
+
+ version: "3.8"
+ services:
+ drill:
+ container_name: drill
+ image: "apache/drill:1.21.1-openjdk-17"
+ labels:
+ breeze.description: "Integration required for drill operator and
hook."
+ volumes:
+ - drill-db-volume:/data
+ - ./drill/drill-override.conf:/opt/drill/conf/drill-override.conf
+ restart: "on-failure"
+ ports:
+ - "${DRILL_HOST_PORT}:8047"
+ stdin_open: true
+ airflow:
+ depends_on:
+ - drill
+ environment:
+ - INTEGRATION_DRILL=true
+ volumes:
+ drill-db-volume:
+
+
+In the example above, ``DRILL_HOST_PORT = "28047"`` has been added to
``dev/breeze/src/airflow_breeze/global_constants.py``.
+
+Then, you'll also need to set the host port as an env. var. for Docker
commands in ``dev/breeze/src/airflow_breeze/params/shell_params.py``
+under the property ``env_variables_for_docker_commands``.
+For the example above, the following statement was added:
+
+.. code-block:: python
+
+ _set_var(_env, "DRILL_HOST_PORT", None, DRILL_HOST_PORT)
+
+The final setup for the integration would be adding a netcat to check that
upon setting the integration, it is possible
+to access the service in the internal port.
+
+For that, you'll need to add the following in
``scripts/in_container/check_environment.sh`` under "Checking backend and
integrations".
+The code block for ``drill`` in this file looks as follows:
+
+.. code-block:: bash
+
+ if [[ ${INTEGRATION_DRILL} == "true" ]]; then
+ check_service "drill" "run_nc drill 8047" 50
+ fi
+
+Then, create the integration test file under ``tests/integration`` - remember
to prefix the file name with ``test_``,
+and to use the ``@pytest.mark.integration`` decorator. It is recommended to
define setup and teardown methods
+(``setup_method`` and ``teardown_method``, respectively) - you could look at
existing integration tests to learn more.
+
+Before pushing to GitHub, make sure to run static checks (``breeze
static-checks --only-my-changes``) to apply linters
+on the Python logic, as well as to update the commands images under
``dev/breeze/docs/images``.
+
+When writing integration tests for components that also require Kerberos, you
could enforce auto-enabling the latter by
+updating ``compose_file()`` method in
``airflow_breeze.params.shell_params.ShellParams``. For example, to ensure that
+Kerberos is active for ``trino`` integration tests, the following code has
been introduced:
+
+.. code-block:: python
+
+ if "trino" in integrations and "kerberos" not in integrations:
+ get_console().print(
+ "[warning]Adding `kerberos` integration as it is implicitly
needed by trino",
+ )
+ compose_file_list.append(DOCKER_COMPOSE_DIR /
"integration-kerberos.yml")
+
+
+
-----
For other kinds of tests look at `Testing document <../09_testing.rst>`__