potiuk commented on code in PR #62171: URL: https://github.com/apache/airflow/pull/62171#discussion_r2847726769
########## devel-common/src/sphinx_exts/templates/openlineage.rst.jinja2: ########## @@ -16,15 +16,62 @@ specific language governing permissions and limitations under the License. #} -Core operators -============== -At the moment, two core operators support OpenLineage. These operators function as a 'black box,' -capable of running any code, which might limit the extent of lineage extraction (e.g. lineage will usually not contain -input/output datasets). To enhance the extraction of lineage information, operators can utilize the hooks listed -below that support OpenLineage. -- :class:`~airflow.providers.standard.operators.python.PythonOperator` (via :class:`airflow.providers.openlineage.extractors.python.PythonExtractor`) -- :class:`~airflow.providers.standard.operators.bash.BashOperator` (via :class:`airflow.providers.openlineage.extractors.bash.BashExtractor`) +Supported classes +***************** + +Below is a list of Operators and Hooks that support OpenLineage extraction, along with specific DB types that are compatible with the supported SQL operators. + +.. important:: + + While we strive to keep the list of supported classes current, + please be aware that our updating process is automated and may not always capture everything accurately. + Detecting hook level lineage is challenging so make sure to double check the information provided below. + +What does "supported operator" mean? +==================================== + +**All Airflow operators will automatically emit OpenLineage events**, (unless explicitly disabled or skipped during +scheduling, like EmptyOperator) regardless of whether they appear on the "supported" list. +Every OpenLineage event will contain basic information such as: + +- Task and DAG run metadata (execution time, state, tags, parameters, owners, description, etc.) +- Job relationship (DAG job that the task belongs to, upstream/downstream relationship between tasks in a DAG etc.) +- Error message (in case of task failure) +- Airflow and OpenLineage provider versions + +**"Supported" operators provide additional metadata** that enhances the lineage information: + +- **Input and output datasets** (sometimes with Column Level Lineage) +- **Operator-specific details** that may include SQL query text and query IDs, source code, job IDs from external systems (e.g., Snowflake or BigQuery job ID), data quality metrics and other information. + +For example, a supported SQL operator will include the executed SQL query, query ID, and input/output table information +in its OpenLineage events. An unsupported operator will still appear in the lineage graph, but without these details. + +.. tip:: + + You can easily implement OpenLineage support for any operator. See :ref:`guides/developer:openlineage`. + + +.. _hook-lineage: + +Hook Level Lineage Review Comment: Nice description! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
