This is an automated email from the ASF dual-hosted git repository. mobuchowski pushed a commit to branch openlineage-docs-guide in repository https://gitbox.apache.org/repos/asf/airflow.git
commit 0944f5a5b5c2c8ce86e7f4d841308bed7e8f1f45 Author: Maciej Obuchowski <[email protected]> AuthorDate: Tue Aug 22 14:45:02 2023 +0200 openlineage: finish user guide Signed-off-by: Maciej Obuchowski <[email protected]> --- .../guides/developer.rst | 3 + .../guides/user.rst | 89 +++++++++++++++++++--- 2 files changed, 81 insertions(+), 11 deletions(-) diff --git a/docs/apache-airflow-providers-openlineage/guides/developer.rst b/docs/apache-airflow-providers-openlineage/guides/developer.rst index 4d5f7a5c67..8f7da5672d 100644 --- a/docs/apache-airflow-providers-openlineage/guides/developer.rst +++ b/docs/apache-airflow-providers-openlineage/guides/developer.rst @@ -17,6 +17,9 @@ under the License. +.. _guides/developer:openlineage + + Implementing OpenLineage in Operators ------------------------------------- diff --git a/docs/apache-airflow-providers-openlineage/guides/user.rst b/docs/apache-airflow-providers-openlineage/guides/user.rst index 8aef3f17c6..3fb934c003 100644 --- a/docs/apache-airflow-providers-openlineage/guides/user.rst +++ b/docs/apache-airflow-providers-openlineage/guides/user.rst @@ -20,19 +20,86 @@ Using OpenLineage integration ----------------------------- -Install -======= +Usage +===== -To use OpenLineage +No change to user DAG files is required to use OpenLineage. However, it needs to be configured. +Primary, and recommended method of configuring OpenLineage Airflow Provider is Airflow configuration. -Config -====== +At minimum, one thing that needs to be set up in every case is ``Transport`` - where do you wish for +your events to end up - for example `Marquez <https://marquezproject.ai/>`_. The ``transport`` field in configuration is used for that purpose. -Primary method of configuring OpenLineage Airflow Provider is Airflow configuration. +.. code-block:: ini -One thing that needs to be set up in every case is ``Transport`` - where do you wish for -your events to end up. + [openlineage] + transport = '{"type": "http", "url": "http://example.com:5000"}' -Another option of configuration is using ``openlineage.yml`` file. -Detailed description of that configuration method is in OpenLineage docs -https://openlineage.io/docs/client/python#configuration + +If you want to look at OpenLineage events without sending them anywhere, you can set up ConsoleTransport - the events will end up in task logs. + +.. code-block:: ini + + [openlineage] + transport = '{"type": "console"}' + + +You can also configure OpenLineage transport using ``openlineage.yml`` file. +Detailed description of that configuration method is in `OpenLineage docs <https://openlineage.io/docs/client/python#configuration>`_. +To do that, you also need to set up path to the file in Airflow config, or point ``OPENLINEAGE_CONFIG`` variable to it: + +.. code-block:: ini + + [openlineage] + config_path = '/path/to/openlineage.yml' + +Lastly, you can set up http transport using ``OPENLINEAGE_URL`` environment variable, passing it the URL target of the OpenLineage consumer. + +It's also very useful to set up OpenLineage namespace for this particular instance. If not set, it's using ``default`` namespace. +That way, if you use multiple OpenLineage producers, events coming from them will be logically separated. + +.. code-block:: ini + + [openlineage] + transport = '{"type": "http", "url": "http://example.com:5000"}' + namespace = 'my-team-airflow-instance` + + +Additional Options +================== + +You can disable sending OpenLineage events without uninstalling OpenLineage provider by setting ``disabled`` to true or setting ``OPENLINEAGE_DISABLED`` +environment variable to True. + +.. code-block:: ini + + [openlineage] + transport = '{"type": "http", "url": "http://example.com:5000"}' + disabled = true + + +Several operators - for example Python, Bash - will by default include their source code in their OpenLineage events. To prevent that, set ``disable_source_code`` to true. + +.. code-block:: ini + + [openlineage] + transport = '{"type": "http", "url": "http://example.com:5000"}' + disable_source_code = true + +If you used OpenLineage previously, and use `Custom Extractors <https://openlineage.io/docs/integrations/airflow/extractors/custom-extractors>`_ feature, you can also use them in OpenLineage provider. +Register the extractors using ``extractors`` config option. + +.. code-block:: ini + + [openlineage] + transport = '{"type": "http", "url": "http://example.com:5000"}' + extractors = full.path.to.ExtractorClass;full.path.to.AnotherExtractorClass + + +Other +===== + +If you want to add OpenLineage coverage for particular operator, take a look at + +:ref:`guides/developer:openlineage` + +For more explanation visit `OpenLineage docs <https://openlineage.io/docs>`_
