GitHub user FANNG1 edited a discussion: Discuss adding Gravitino support to the OpenLineage community
## Background Gravitino has OpenLineage integration support today, but the plugin is not open sourced in the OpenLineage community. This makes it harder to maintain, review, evolve, and align with OpenLineage's dataset model. I would like to discuss whether we should first contribute Gravitino support to the OpenLineage community. ## Context OpenLineage dataset naming is datasource-oriented. For example, the Spark Iceberg integration can emit the physical storage dataset as the primary dataset and add a table identifier through the `symlinks` facet. For Gravitino, the logical resource model is different. A table is normally identified through: - metalake - catalog - schema - table This does not map directly to a single OpenLineage dataset name without choosing a convention. ## Proposal Add Gravitino support in the OpenLineage community first, and keep emitted dataset identifiers consistent with OpenLineage conventions. On the Gravitino side, the server can translate OpenLineage dataset identifiers into Gravitino's internal resource model. For example, Gravitino could resolve a dataset by using one or more of: - dataset namespace - dataset name - symlinks facet - catalog dataset facet or custom Gravitino facet - Gravitino-specific configuration such as the target metalake This keeps OpenLineage producers aligned with OpenLineage naming, while allowing Gravitino to map lineage events back to `metalake.catalog.schema.table`. References: - OpenLineage naming conventions: https://openlineage.io/docs/spec/naming - Spark Iceberg handler: https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/spark3/src/main/java/io/openlineage/spark3/agent/lifecycle/plan/catalog/iceberg/IcebergHandler.java - Spark Iceberg example event with symlinks: https://github.com/OpenLineage/OpenLineage/blob/main/integration/spark/app/integrations/container/pysparkV2ReplaceTableAsSelectCompleteEvent.json GitHub link: https://github.com/apache/gravitino/discussions/10850 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
