pawelgrochowicz commented on code in PR #55573:
URL: https://github.com/apache/airflow/pull/55573#discussion_r2368566695
##########
providers/google/src/airflow/providers/google/cloud/operators/datafusion.py:
##########
@@ -863,6 +867,34 @@ def execute_complete(self, context: Context, event:
dict[str, Any]):
)
return event["pipeline_id"]
+ def get_openlineage_facets_on_complete(self, task_instance) ->
OperatorLineage | None:
+ """Build and return OpenLineage facets and datasets for the completed
pipeline start."""
+ from airflow.providers.common.compat.openlineage.facet import Dataset
+ from airflow.providers.google.cloud.openlineage.facets import
DataFusionRunFacet
+ from airflow.providers.openlineage.extractors import OperatorLineage
+
+ pipeline_resource = (
+ f"projects/{self.project_id}/locations/{self.location}/instances/"
+ f"{self.instance_name}/pipelines/{self.pipeline_name}"
+ )
+
+ inputs = [Dataset(namespace="datafusion", name=pipeline_resource)]
Review Comment:
I was following the Pub/Sub example from [the OpenLineage naming
spec](https://openlineage.io/docs/spec/naming/#dataset-naming)
and the way the pattern is defined. In Pub/Sub, both the topic and the
subscription are included in the name, for example:
topic:{projectId}:{topicId}
subscription:{projectId}:{subscriptionId}
The namespace in this case is simply the service name: "pubsub".
Another example is BigQuery, where the namespace is again just the service
name, but the full details are included in the name.
For Data Fusion, things are slightly different. There is an instance, which
acts as the container for all pipelines (similar to grouping multiple workflow
definitions in one place). Within that instance, there are pipelines, which
represent the actual workflows. This is why I was inspired by pubsub
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]