This is an automated email from the ASF dual-hosted git repository.

potiuk pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git


The following commit(s) were added to refs/heads/main by this push:
     new d872f7ed93 Update integration tests docs and add "Writing Integration 
Tests" guide (#39986)
d872f7ed93 is described below

commit d872f7ed932ba9b66866539071f0a990744c2e90
Author: Shahar Epstein <[email protected]>
AuthorDate: Sat Jun 1 15:57:16 2024 +0300

    Update integration tests docs and add "Writing Integration Tests" guide 
(#39986)
    
    Co-authored-by: Jarek Potiuk <[email protected]>
---
 contributing-docs/testing/integration_tests.rst | 139 +++++++++++++++++++++---
 1 file changed, 125 insertions(+), 14 deletions(-)

diff --git a/contributing-docs/testing/integration_tests.rst 
b/contributing-docs/testing/integration_tests.rst
index 2c55d1f857..80cf169d90 100644
--- a/contributing-docs/testing/integration_tests.rst
+++ b/contributing-docs/testing/integration_tests.rst
@@ -18,9 +18,11 @@
 Airflow Integration Tests
 =========================
 
-Some of the tests in Airflow are integration tests. These tests require 
``airflow`` Docker
-image and extra images with integrations (such as ``celery``, ``mongodb``, 
etc.).
-The integration tests are all stored in the ``tests/integration`` folder.
+Integration tests in Airflow check the interactions between Airflow components 
and external services
+that could run as separate Docker containers, without connecting to an 
external API on the internet.
+These tests require ``airflow`` Docker image and extra images with 
integrations (such as ``celery``, ``mongodb``, etc.).
+The integration tests are all stored in the ``tests/integration`` folder, and 
similarly to the unit tests they all run
+using `pytest <http://doc.pytest.org/en/latest/>`_, but they are skipped by 
default unless ``--integration`` flag is passed to pytest.
 
 **The outline for this document in GitHub is available at top-right corner 
button (with 3-dots and 3 lines).**
 
@@ -28,18 +30,20 @@ Enabling Integrations
 ---------------------
 
 Airflow integration tests cannot be run in the local virtualenv. They can only 
run in the Breeze
-environment with enabled integrations and in the CI. See `CI 
<../../dev/breeze/doc/ci/README.md>`_ for
+environment and in the CI, with their respective integrations enabled. See `CI 
<../../dev/breeze/doc/ci/README.md>`_ for
 details about Airflow CI.
 
-When you are in the Breeze environment, by default, all integrations are 
disabled. This enables only true unit tests
-to be executed in Breeze. You can enable the integration by passing the 
``--integration <INTEGRATION>``
-switch when starting Breeze. You can specify multiple integrations by 
repeating the ``--integration`` switch
-or using the ``--integration all-testable`` switch that enables all testable 
integrations and
-``--integration all`` switch that enables all integrations.
+When you initiate a Breeze environment, by default, all integrations are 
disabled. This enables only unit tests
+to be executed in Breeze. You can enable an integration by passing the 
``--integration <INTEGRATION>``
+switch when starting Breeze, either with ``breeze shell`` or with ``breeze 
start-airflow``. As there's no need to simulate
+a full setup of Airflow during integration tests, using ``breeze shell`` (or 
simply ``breeze``) to run them is
+sufficient. You can specify multiple integrations by repeating the 
``--integration`` switch, or by using the ``--integration all-testable`` switch
+that enables all testable integrations. You may use ``--integration all`` 
switch to enable all integrations that
+includes also non-testable integrations such as openlineage.
 
 NOTE: Every integration requires a separate container with the corresponding 
integration image.
 These containers take precious resources on your PC, mainly the memory. The 
started integrations are not stopped
-until you stop the Breeze environment with the ``stop`` command and started 
with the ``start`` command.
+until you stop the Breeze environment with the ``breeze down`` command.
 
 The following integrations are available:
 
@@ -77,18 +81,20 @@ The following integrations are available:
 
 .. END AUTO-GENERATED INTEGRATION LIST'
 
-To start the ``mongo`` integration only, enter:
+To start a shell with ``mongo`` integration enabled, enter:
 
 .. code-block:: bash
 
     breeze --integration mongo
 
-To start ``mongo`` and ``cassandra`` integrations, enter:
+You could add multiple ``--integration`` options as the types of the 
integrations that you want to enable.
+For example, to start a shell with both ``mongo`` and ``cassandra`` 
integrations enabled, enter:
 
 .. code-block:: bash
 
     breeze --integration mongo --integration cassandra
 
+
 To start all testable integrations, enter:
 
 .. code-block:: bash
@@ -99,7 +105,7 @@ To start all integrations, enter:
 
 .. code-block:: bash
 
-    breeze --integration all-testable
+    breeze --integration all
 
 Note that Kerberos is a special kind of integration. Some tests run 
differently when
 Kerberos integration is enabled (they retrieve and use a Kerberos 
authentication token) and differently when the
@@ -109,7 +115,7 @@ for the CI system should run all tests with the Kerberos 
integration enabled to
 Running Integration Tests
 -------------------------
 
-All tests using an integration are marked with a custom pytest marker 
``pytest.mark.integration``.
+All integration tests are marked with a custom pytest marker 
``pytest.mark.integration``.
 The marker has a single parameter - the name of integration.
 
 Example of the ``celery`` integration test:
@@ -172,6 +178,111 @@ Runs all mongo DB tests:
 
        breeze testing integration-tests --db-reset --integration mongo
 
+Writing Integration Tests
+-------------------------
+Before creating the integration tests, you'd like to make the integration 
itself (i.e., the service) available for use.
+For that, you'll first need to create a Docker Compose YAML file under 
``scripts/ci/docker-compose``, named
+``integration-<INTEGRATION>.yml``. The file should define one service for the 
integration, and another one
+for the Airflow instance that depends on it. It is recommended to stick to the 
following guidelines:
+
+
+1. Name the ``services::<INTEGRATION>::container_name`` as the service's name 
and give it an appropriate description under
+``services::<INTEGRATION>::labels:breeze.description``, so it would be easier 
to detect it in Docker for debugging
+purposes.
+
+2. Use an official stable release of the service with a pinned version. When 
there are number of possibilities for an
+image, you should probably pick the latest version that is supported by 
Airflow.
+
+3. Set the ``services::<INTEGRATION>::restart`` to "on-failure".
+
+4. For integrations that require persisting data (for example, databases), 
define a volume at ``volumes::<VOLUME_NAME>``
+and mount the volume to the data path on the container by listing it under 
``services:<INTEGRATION>::volumes``
+(see example).
+
+5. Check what ports should be exposed to use the service - carefully validate 
that these ports are not in use by other
+integrations (consult the community what to do if such case happens). To avoid 
conflicts with host's ports, it is a
+good practice to prefix the corresponding host port with a number (usually 2), 
parametrize it and to list the parameter
+under ``# Initialise base variables`` section in 
``dev/breeze/src/airflow_breeze/global_constants.py``.
+
+6. In some cases you might need to change the entrypoint of the service's 
container, for example, by setting
+``stdin_open: true``.
+
+7. In the Airflow service definition, ensure that it depends on the 
integration's service (``depands_on``) and set
+the env. var. ``INTEGRATION-<INTEGRATION>`` to true.
+
+8. If you need to mount a file (for example, a configuration file), you could 
put it at ``scripts/ci/docker-compose``
+(or a subfolder of this path) and list it under 
``services::<INTEGRATION>::volumes``.
+
+For example, ``integration-drill.yml`` looks as follows:
+
+  .. code-block:: yaml
+
+      version: "3.8"
+      services:
+        drill:
+          container_name: drill
+          image: "apache/drill:1.21.1-openjdk-17"
+          labels:
+            breeze.description: "Integration required for drill operator and 
hook."
+          volumes:
+            - drill-db-volume:/data
+            - ./drill/drill-override.conf:/opt/drill/conf/drill-override.conf
+          restart: "on-failure"
+          ports:
+            - "${DRILL_HOST_PORT}:8047"
+          stdin_open: true
+        airflow:
+          depends_on:
+            - drill
+          environment:
+            - INTEGRATION_DRILL=true
+      volumes:
+        drill-db-volume:
+
+
+In the example above, ``DRILL_HOST_PORT = "28047"`` has been added to 
``dev/breeze/src/airflow_breeze/global_constants.py``.
+
+Then, you'll also need to set the host port as an env. var. for Docker 
commands in ``dev/breeze/src/airflow_breeze/params/shell_params.py``
+under the property ``env_variables_for_docker_commands``.
+For the example above, the following statement was added:
+
+.. code-block:: python
+
+    _set_var(_env, "DRILL_HOST_PORT", None, DRILL_HOST_PORT)
+
+The final setup for the integration would be adding a netcat to check that 
upon setting the integration, it is possible
+to access the service in the internal port.
+
+For that, you'll need to add the following in 
``scripts/in_container/check_environment.sh`` under "Checking backend and 
integrations".
+The code block for ``drill`` in this file looks as follows:
+
+.. code-block:: bash
+
+    if [[ ${INTEGRATION_DRILL} == "true" ]]; then
+        check_service "drill" "run_nc drill 8047" 50
+    fi
+
+Then, create the integration test file under ``tests/integration`` - remember 
to prefix the file name with ``test_``,
+and to use the ``@pytest.mark.integration`` decorator. It is recommended to 
define setup and teardown methods
+(``setup_method`` and ``teardown_method``, respectively) - you could look at 
existing integration tests to learn more.
+
+Before pushing to GitHub, make sure to run static checks (``breeze 
static-checks --only-my-changes``) to apply linters
+on the Python logic, as well as to update the commands images under 
``dev/breeze/docs/images``.
+
+When writing integration tests for components that also require Kerberos, you 
could enforce auto-enabling the latter by
+updating ``compose_file()`` method in 
``airflow_breeze.params.shell_params.ShellParams``. For example, to ensure that
+Kerberos is active for ``trino`` integration tests, the following code has 
been introduced:
+
+.. code-block:: python
+
+        if "trino" in integrations and "kerberos" not in integrations:
+            get_console().print(
+                "[warning]Adding `kerberos` integration as it is implicitly 
needed by trino",
+            )
+            compose_file_list.append(DOCKER_COMPOSE_DIR / 
"integration-kerberos.yml")
+
+
+
 -----
 
 For other kinds of tests look at `Testing document <../09_testing.rst>`__

Reply via email to