This is an automated email from the ASF dual-hosted git repository.
mobuchowski pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/airflow.git
The following commit(s) were added to refs/heads/main by this push:
new df77f37266b docs: Clarify HLL in extraction precedence (#63723)
df77f37266b is described below
commit df77f37266b25c9a8cb44f4672296558dbf830ce
Author: Kacper Muda <[email protected]>
AuthorDate: Mon Mar 16 14:23:05 2026 +0100
docs: Clarify HLL in extraction precedence (#63723)
---
devel-common/src/sphinx_exts/templates/openlineage.rst.jinja2 | 3 +++
providers/openlineage/docs/guides/developer.rst | 10 +++++++++-
2 files changed, 12 insertions(+), 1 deletion(-)
diff --git a/devel-common/src/sphinx_exts/templates/openlineage.rst.jinja2
b/devel-common/src/sphinx_exts/templates/openlineage.rst.jinja2
index 52c6a6df8c4..5f5e32516ad 100644
--- a/devel-common/src/sphinx_exts/templates/openlineage.rst.jinja2
+++ b/devel-common/src/sphinx_exts/templates/openlineage.rst.jinja2
@@ -69,6 +69,9 @@ the integration can go further. Besides recording which
assets were read or writ
it may also extract the executed SQL text, external query/job IDs. For each
query a separate pair of child OpenLineage
events is emitted.
+For details on when hook-level lineage is attached to the OpenLineage event
and how it interacts with
+extractors and inlets/outlets, see :ref:`extraction_precedence:openlineage`.
+
.. important::
The level of detail captured varies between hooks and methods. Some may only
report dataset information, while others
expose SQL text, query IDs and more. Review the hook implementation to
confirm what lineage data is available.
diff --git a/providers/openlineage/docs/guides/developer.rst
b/providers/openlineage/docs/guides/developer.rst
index c37bd8c366a..ed2651bc8f2 100644
--- a/providers/openlineage/docs/guides/developer.rst
+++ b/providers/openlineage/docs/guides/developer.rst
@@ -41,7 +41,15 @@ it's important to keep in mind the order in which
OpenLineage looks for lineage
1. **Extractor** - check if there is a custom Extractor specified for Operator
class name. Any custom Extractor registered by the user will take precedence
over default Extractors defined in Airflow Provider source code (f.e.
BashExtractor).
2. **OpenLineage methods** - if there is no Extractor explicitly specified for
Operator class name, DefaultExtractor is used, that looks for OpenLineage
methods in Operator.
-3. **Inlets and Outlets** - if there are no OpenLineage methods defined in the
Operator, inlets and outlets are checked.
+3. **Hook Level Lineage** - when extractor or Openlineage methods return no
inputs and no outputs, hook lineage is merged
+ with any other metadata produced (e.g. run facets, job facets). When
neither extractor nor Openlineage methods
+ are present, hook lineage is used directly as the full lineage result. In
both cases it takes precedence over inlets
+ and outlets.
+4. **Inlets and Outlets** - only consulted as a last resort when all of the
above yield no datasets. This step
+ attempts to convert inlets and outlets into OpenLineage input/output
datasets, which has limited support.
+ Note that inlets and outlets defined as Airflow Assets are always included
in the ``airflow`` run facet
+ (under ``task.inlets`` / ``task.outlets``) regardless of whether this
conversion succeeds —
+ so the ``airflow`` run facet is the most reliable place to look for
inlet/outlet information.
If all the above options are missing, no lineage data is extracted from the
Operator. You will still receive OpenLineage events
enriched with things like general Airflow facets, proper event time and type,
but the inputs/outputs will be empty