JDarDagran commented on code in PR #37620:
URL: https://github.com/apache/airflow/pull/37620#discussion_r1501787601


##########
airflow/providers/openlineage/provider.yaml:
##########
@@ -58,65 +58,67 @@ config:
   openlineage:
     description: |
       This section applies settings for OpenLineage integration.
-      For backwards compatibility with `openlineage-python` one can still use
-      `openlineage.yml` file or `OPENLINEAGE_` environment variables. However, 
below
-      configuration takes precedence over those.
-      More in documentation - 
https://openlineage.io/docs/client/python#configuration.
+      More about configuration and it's precedence can be found at
+      
https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/guides/user.html#transport-setup
     options:
       disabled:
         description: |
-          Set this to true if you don't want OpenLineage to emit events.
+          Disable sending events without uninstalling the OpenLineage Provider 
by setting this to true.
         type: boolean
         example: ~
         default: "False"
         version_added: ~
       disabled_for_operators:
         description: |
-          Semicolon separated string of Airflow Operator names to disable
+          Exclude some Operators from emitting OpenLineage events by passing a 
string of semicolon separated
+          full import paths of Operators to disable.
         type: string
         example: 
"airflow.operators.bash.BashOperator;airflow.operators.python.PythonOperator"
         default: ""
         version_added: 1.1.0
       namespace:
         description: |
-          OpenLineage namespace
+          Set namespace that the lineage data belongs to, so that if you use 
multiple OpenLineage producers,
+          events coming from them will be logically separated.
         version_added: ~
         type: string
-        example: "food_delivery"
+        example: "my_airflow_instance_1"
         default: ~
       extractors:
         description: |
-          Semicolon separated paths to custom OpenLineage extractors.
+          Register custom OpenLineage Extractors by passing a string of 
semicolon separated full import paths.
         type: string
         example: full.path.to.ExtractorClass;full.path.to.AnotherExtractorClass
         default: ~
         version_added: ~
       config_path:
         description: |
-          Path to YAML config. This provides backwards compatibility to pass 
config as
+          Provide path to YAML config file. This provides backwards 
compatibility to pass config as
           `openlineage.yml` file.

Review Comment:
   ```suggestion
             Specify the path to the YAML configuration file.
             This ensures backwards compatibility with passing config through 
the `openlineage.yml` file.
   ```



##########
docs/apache-airflow-providers-openlineage/guides/structure.rst:
##########
@@ -17,16 +17,60 @@
     under the License.
 
 
-Structure of OpenLineage Airflow integration
+OpenLineage Airflow integration
 --------------------------------------------
 
-OpenLineage integration implements AirflowPlugin. This allows it to be 
discovered on Airflow start and
-register Airflow Listener.
+OpenLineage is an open framework for data lineage collection and analysis.
+At its core is an extensible specification that systems can use to 
interoperate with lineage metadata.
+`Check out OpenLineage docs <https://openlineage.io/docs/>`_.
 
-The listener is then called when certain events happen in Airflow - when DAGs 
or TaskInstances start, complete or fail.
-For DAGs, the listener runs in Airflow Scheduler.
-For TaskInstances, the listener runs on Airflow Worker.
+Quickstart
+==========
+
+To instrument your Airflow instance with OpenLineage, see 
:ref:`guides/user:openlineage`.
+
+To implement OpenLineage support for Airflow Operators, see 
:ref:`guides/developer:openlineage`.
+
+What's in it for me ?
+=====================
+
+The metadata collected can answer questions like:
+
+- Why did specific data transformation fail?
+- What are the upstream sources feeding into certain dataset?
+- What downstream processes rely on this specific dataset?
+- Is my data fresh?
+- Can I identify the bottleneck in my data processing pipeline?
+- How did the latest code change affect data processing times?
+- How can I trace the cause of data inaccuracies in my report?
+- How are data privacy and compliance requirements being managed through the 
data's lifecycle?
+- Are there redundant data processes that can be optimized or removed?
+- What data dependencies exist for this critical report?
+
+Understanding complex inter-DAG dependencies and providing up-to-date runtime 
visibility into DAG execution can be challenging.
+OpenLineage integrates with Airflow to collect DAG lineage metadata so that 
inter-DAG dependencies are easily maintained
+and viewable via a lineage graph, while also keeping a catalog of historical 
runs of DAGs.
+
+.. image:: 
https://openlineage.io/assets/images/af-schematic-ad8c295a182cb32b94ee27b96727fa98.svg
+   :alt: airflow_lineage
+   :width: 1792
+
+For OpenLineage backend that will receive events, you can use `Marquez 
<https://marquezproject.ai/>`_
+
+.. image:: https://marquezproject.ai/img/screenshot.png

Review Comment:
   I'm not sure if putting external URLs for images is correct approach. I 
didn't find such other example in current Airflow docs.



##########
docs/apache-airflow-providers-openlineage/guides/structure.rst:
##########
@@ -17,16 +17,60 @@
     under the License.
 
 
-Structure of OpenLineage Airflow integration
+OpenLineage Airflow integration
 --------------------------------------------
 
-OpenLineage integration implements AirflowPlugin. This allows it to be 
discovered on Airflow start and
-register Airflow Listener.
+OpenLineage is an open framework for data lineage collection and analysis.
+At its core is an extensible specification that systems can use to 
interoperate with lineage metadata.
+`Check out OpenLineage docs <https://openlineage.io/docs/>`_.
 
-The listener is then called when certain events happen in Airflow - when DAGs 
or TaskInstances start, complete or fail.
-For DAGs, the listener runs in Airflow Scheduler.
-For TaskInstances, the listener runs on Airflow Worker.
+Quickstart
+==========
+
+To instrument your Airflow instance with OpenLineage, see 
:ref:`guides/user:openlineage`.
+
+To implement OpenLineage support for Airflow Operators, see 
:ref:`guides/developer:openlineage`.
+
+What's in it for me ?
+=====================
+
+The metadata collected can answer questions like:
+
+- Why did specific data transformation fail?
+- What are the upstream sources feeding into certain dataset?
+- What downstream processes rely on this specific dataset?
+- Is my data fresh?
+- Can I identify the bottleneck in my data processing pipeline?
+- How did the latest code change affect data processing times?
+- How can I trace the cause of data inaccuracies in my report?
+- How are data privacy and compliance requirements being managed through the 
data's lifecycle?
+- Are there redundant data processes that can be optimized or removed?
+- What data dependencies exist for this critical report?
+
+Understanding complex inter-DAG dependencies and providing up-to-date runtime 
visibility into DAG execution can be challenging.
+OpenLineage integrates with Airflow to collect DAG lineage metadata so that 
inter-DAG dependencies are easily maintained
+and viewable via a lineage graph, while also keeping a catalog of historical 
runs of DAGs.
+
+.. image:: 
https://openlineage.io/assets/images/af-schematic-ad8c295a182cb32b94ee27b96727fa98.svg

Review Comment:
   I'm not sure if putting external URLs for images is correct approach. I 
didn't find such other example in current Airflow docs.



##########
docs/apache-airflow-providers-openlineage/guides/structure.rst:
##########
@@ -17,16 +17,60 @@
     under the License.
 
 
-Structure of OpenLineage Airflow integration
+OpenLineage Airflow integration
 --------------------------------------------
 
-OpenLineage integration implements AirflowPlugin. This allows it to be 
discovered on Airflow start and
-register Airflow Listener.
+OpenLineage is an open framework for data lineage collection and analysis.
+At its core is an extensible specification that systems can use to 
interoperate with lineage metadata.

Review Comment:
   ```suggestion
   At its core it is an extensible specification that systems can use to 
interoperate with lineage metadata.
   ```



##########
airflow/providers/openlineage/provider.yaml:
##########
@@ -58,65 +58,67 @@ config:
   openlineage:
     description: |
       This section applies settings for OpenLineage integration.
-      For backwards compatibility with `openlineage-python` one can still use
-      `openlineage.yml` file or `OPENLINEAGE_` environment variables. However, 
below
-      configuration takes precedence over those.
-      More in documentation - 
https://openlineage.io/docs/client/python#configuration.
+      More about configuration and it's precedence can be found at
+      
https://airflow.apache.org/docs/apache-airflow-providers-openlineage/stable/guides/user.html#transport-setup
     options:
       disabled:
         description: |
-          Set this to true if you don't want OpenLineage to emit events.
+          Disable sending events without uninstalling the OpenLineage Provider 
by setting this to true.
         type: boolean
         example: ~
         default: "False"
         version_added: ~
       disabled_for_operators:
         description: |
-          Semicolon separated string of Airflow Operator names to disable
+          Exclude some Operators from emitting OpenLineage events by passing a 
string of semicolon separated
+          full import paths of Operators to disable.
         type: string
         example: 
"airflow.operators.bash.BashOperator;airflow.operators.python.PythonOperator"
         default: ""
         version_added: 1.1.0
       namespace:
         description: |
-          OpenLineage namespace
+          Set namespace that the lineage data belongs to, so that if you use 
multiple OpenLineage producers,
+          events coming from them will be logically separated.
         version_added: ~
         type: string
-        example: "food_delivery"
+        example: "my_airflow_instance_1"
         default: ~
       extractors:
         description: |
-          Semicolon separated paths to custom OpenLineage extractors.
+          Register custom OpenLineage Extractors by passing a string of 
semicolon separated full import paths.
         type: string
         example: full.path.to.ExtractorClass;full.path.to.AnotherExtractorClass
         default: ~
         version_added: ~
       config_path:
         description: |
-          Path to YAML config. This provides backwards compatibility to pass 
config as
+          Provide path to YAML config file. This provides backwards 
compatibility to pass config as
           `openlineage.yml` file.
         version_added: ~
         type: string
-        example: ~
+        example: "full/path/to/openlineage.yml"
         default: ""
       transport:
         description: |
-          OpenLineage Client transport configuration. It should contain type
-          and additional options per each type.
+          Pass OpenLineage Client transport configuration as JSON string. It 
should contain type of the
+          transport and additional options (different for each transport 
type). For more details see:
+          https://openlineage.io/docs/client/python/#built-in-transport-types
 
           Currently supported types are:
 
             * HTTP
             * Kafka
             * Console
+            * File
         type: string
-        example: '{"type": "http", "url": "http://localhost:5000"}'
+        example: '{"type": "http", "url": "http://localhost:5000";, "endpoint": 
"api/v1/lineage"}'
         default: ""
         version_added: ~
       disable_source_code:
         description: |
-          If disabled, OpenLineage events do not contain source code of 
particular
-          operators, like PythonOperator.
+          Disable including source code in OpenLineage events by setting this 
to true. Several Operators (f.e.
+          Python, Bash) will by default include their source code in their 
OpenLineage events if not disabled.

Review Comment:
   ```suggestion
             Disable the inclusion of source code in OpenLineage events by 
setting this to `true`.
             By default, several Operators (e.g. Python, Bash) will include 
their source code in the events
             unless this is disabled.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to