pawelgrochowicz commented on code in PR #55573:
URL: https://github.com/apache/airflow/pull/55573#discussion_r2368566695


##########
providers/google/src/airflow/providers/google/cloud/operators/datafusion.py:
##########
@@ -863,6 +867,34 @@ def execute_complete(self, context: Context, event: 
dict[str, Any]):
         )
         return event["pipeline_id"]
 
+    def get_openlineage_facets_on_complete(self, task_instance) -> 
OperatorLineage | None:
+        """Build and return OpenLineage facets and datasets for the completed 
pipeline start."""
+        from airflow.providers.common.compat.openlineage.facet import Dataset
+        from airflow.providers.google.cloud.openlineage.facets import 
DataFusionRunFacet
+        from airflow.providers.openlineage.extractors import OperatorLineage
+
+        pipeline_resource = (
+            f"projects/{self.project_id}/locations/{self.location}/instances/"
+            f"{self.instance_name}/pipelines/{self.pipeline_name}"
+        )
+
+        inputs = [Dataset(namespace="datafusion", name=pipeline_resource)]

Review Comment:
   I was following the Pub/Sub example from [the OpenLineage naming 
spec](https://openlineage.io/docs/spec/naming/#dataset-naming)
    and the way the pattern is defined. In Pub/Sub, both the topic and the 
subscription are included in the name, for example:
   
   topic:{projectId}:{topicId}
   
   subscription:{projectId}:{subscriptionId}
   
   The namespace in this case is simply the service name: "pubsub".
   
   Another example is BigQuery, where the namespace is again just the service 
name, but the full details are included in the name.
   
   For Data Fusion, things are slightly different. There is an instance, which 
acts as the container for all pipelines (similar to grouping multiple workflow 
definitions in one place). Within that instance, there are pipelines, which 
represent the actual workflows. This is why I was inspired by pubsub



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to