pankajkoti commented on code in PR #31817:
URL: https://github.com/apache/airflow/pull/31817#discussion_r1227630848
##########
docs/apache-airflow-providers-openlineage/guides/developer.rst:
##########
@@ -50,4 +49,25 @@ Instead of returning complete OpenLineage event, the
provider defines ``Operator
run_facets: dict[str, BaseFacet] = Factory(dict)
job_facets: dict[str, BaseFacet] = Factory(dict)
-OpenLineage integration takes care to enrich it with things like general
Airflow facets, proper event time and type.
+OpenLineage integration itself takes care to enrich it with things like
general Airflow facets, proper event time and type, creating proper OpenLineage
RunEvent.
+
+How to properly implement OpenLineage methods?
+==============================================
+
+There are several things worth noting when implementing OpenLineage in
operators.
+
+First, do not import OpenLineage methods on top-level, but in OL method itself.
+This allows users to use your provider even if they do not have OpenLineage
provider installed.
+
+Another important point is to make sure your provider returns
OpenLineage-compliant dataset names.
+It allows OpenLineage consumers to properly match information about datasets
gathered from different sources.
+
+Naming convention is described in the `OpenLineage docs
<https://openlineage.io/docs/spec/naming>`__.
+
+OpenLineage implementation should not waste time of users that do not use it.
+This means not doing heavy processing or network calls in the ``execute``
method that aren't used by it.
+Better option is to save relevant information in Operator attribute - and then
use it
+in OpenLineage method.
+
+Good example is ``BigQueryExecuteQueryOperator``. It saves ``job_ids`` of
queries that were executer.
Review Comment:
```suggestion
Good example is ``BigQueryExecuteQueryOperator``. It saves ``job_ids`` of
queries that were executed.
```
##########
docs/apache-airflow-providers-openlineage/guides/developer.rst:
##########
@@ -50,4 +49,25 @@ Instead of returning complete OpenLineage event, the
provider defines ``Operator
run_facets: dict[str, BaseFacet] = Factory(dict)
job_facets: dict[str, BaseFacet] = Factory(dict)
-OpenLineage integration takes care to enrich it with things like general
Airflow facets, proper event time and type.
+OpenLineage integration itself takes care to enrich it with things like
general Airflow facets, proper event time and type, creating proper OpenLineage
RunEvent.
+
+How to properly implement OpenLineage methods?
+==============================================
+
+There are several things worth noting when implementing OpenLineage in
operators.
+
+First, do not import OpenLineage methods on top-level, but in OL method itself.
+This allows users to use your provider even if they do not have OpenLineage
provider installed.
+
+Another important point is to make sure your provider returns
OpenLineage-compliant dataset names.
+It allows OpenLineage consumers to properly match information about datasets
gathered from different sources.
+
+Naming convention is described in the `OpenLineage docs
<https://openlineage.io/docs/spec/naming>`__.
+
+OpenLineage implementation should not waste time of users that do not use it.
+This means not doing heavy processing or network calls in the ``execute``
method that aren't used by it.
+Better option is to save relevant information in Operator attribute - and then
use it
Review Comment:
```suggestion
Better option is to save relevant information in Operator attributes - and
then use it
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]