+1 binding - I've been following the doc and comments and I think this will make Lineage a realistic possibility for all.
-Ash On 10 February 2023 23:26:48 GMT, Julien Le Dem <jul...@astronomer.io.INVALID> wrote: >Dear Airflow community, > >Following the discussion thread over the past few weeks, I'd like to call a >vote on AIP-53 OpenLineage in Airflow: >https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-53+OpenLineage+in+Airflow > >The discussion thread is linked in the confluence doc if you wish to >consult the history of the conversation. Thank you to all who contributed! > >This is my (non-binding!) +1, the vote will last until midnight (UTC) on >Friday 17th February. > >Thanks, >Julien > >*For reference, the Motivation section in the doc:* > >Operational lineage collection is a common need to understand dependencies >between data pipelines and track end-to-end provenance of data. It enables >many use cases from ensuring reliable delivery of data through >observability to compliance and cost management. > >Publishing operational lineage is a core Airflow capability to enable >troubleshooting and governance. > >OpenLineage <https://openlineage.io/> is a project part of the LFAI&Data ><https://lfaidata.foundation/projects/> foundation that provides a spec >standardizing operational lineage collection and sharing across the data >ecosystem. If it provides plugins for popular open source projects, its >intent is very similar to OpenTelemetry <https://opentelemetry.io/> (also >under the Linux Foundation umbrella): to remain a spec for lineage exchange >that projects - open source or proprietary - implement. > >Built-in OpenLineage support in Airflow will make it easier and more >reliable for Airflow users to publish their operational lineage through the >OpenLineage ecosystem. > >The current external plugin maintained in the OpenLineage project depends >on Airflow and operators internals and gets broken when changes are made on >those. Having a built-in integration ensures a better first class support >to expose lineage that gets tested alongside other changes and therefore is >more stable. > >Today, OpenLineage consumers in the ecosystem include: Egeria ><https://egeria-project.org/features/lineage-management/overview/#the-openlineage-standard> >(bank >compliance), Marquez <https://marquezproject.ai/> (build your own metadata >platform for compliance for example), Microsoft Purview ><https://learn.microsoft.com/en-us/samples/microsoft/purview-adb-lineage-solution-accelerator/azure-databricks-to-purview-lineage-connector/> >(Governance, >…), Astro <https://www.astronomer.io/why-openlineage/> (data >observability), Amundsen ><https://www.amundsen.io/amundsen/databuilder/#openlineagetablelineageextractor>. >AWS recently blogged about using OpenLineage in the AWS ecosystem ><https://aws.amazon.com/blogs/big-data/automate-data-lineage-on-amazon-mwaa-with-openlineage/>. >Other projects are at various levels of progress. > >On the producer side, there is support for open source projects like >Airflow, dbt, Spark, Flink, GreatExpectations and proprietary warehouses >like Snowflake ><https://github.com/Snowflake-Labs/OpenLineage-AccessHistory-Setup/blob/main/README.md>, >BigQuery, Redshift ><https://aws.amazon.com/blogs/big-data/automate-data-lineage-on-amazon-mwaa-with-openlineage/> >through >API integration or SQL parsing. > >Examples of users talking about their usage of OpenLineage can be found on >the Openlineage blog ><https://openlineage.io/blog/openlineage-at-northwestern-mutual/>.. > >This integration will also stimulate the continued growth of the >OpenLineage ecosystem and create more value for Airflow users.