+1 binding - I've been following the doc and comments and I think this will 
make Lineage a realistic possibility for all.

-Ash 

On 10 February 2023 23:26:48 GMT, Julien Le Dem <jul...@astronomer.io.INVALID> 
wrote:
>Dear Airflow community,
>
>Following the discussion thread over the past few weeks, I'd like to call a
>vote on AIP-53 OpenLineage in Airflow:
>https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-53+OpenLineage+in+Airflow
>
>The discussion thread is linked in the confluence doc if you wish to
>consult the history of the conversation. Thank you to all who contributed!
>
>This is my (non-binding!) +1, the vote will last until midnight (UTC) on
>Friday 17th February.
>
>Thanks,
>Julien
>
>*For reference, the Motivation section in the doc:*
>
>Operational lineage collection is a common need to understand dependencies
>between data pipelines and track end-to-end provenance of data. It enables
>many use cases from ensuring reliable delivery of data through
>observability to compliance and cost management.
>
>Publishing operational lineage is a core Airflow capability to enable
>troubleshooting and governance.
>
>OpenLineage <https://openlineage.io/> is a project part of the LFAI&Data
><https://lfaidata.foundation/projects/> foundation that provides a spec
>standardizing operational lineage collection and sharing across the data
>ecosystem. If it provides plugins for popular open source projects, its
>intent is very similar to OpenTelemetry <https://opentelemetry.io/> (also
>under the Linux Foundation umbrella): to remain a spec for lineage exchange
>that projects - open source or proprietary - implement.
>
>Built-in OpenLineage support in Airflow will make it easier and more
>reliable for Airflow users to publish their operational lineage through the
>OpenLineage ecosystem.
>
>The current external plugin maintained in the OpenLineage project depends
>on Airflow and operators internals and gets broken when changes are made on
>those. Having a built-in integration ensures a better first class support
>to expose lineage that gets tested alongside other changes and therefore is
>more stable.
>
>Today, OpenLineage consumers in the ecosystem include: Egeria
><https://egeria-project.org/features/lineage-management/overview/#the-openlineage-standard>
>(bank
>compliance), Marquez <https://marquezproject.ai/> (build your own metadata
>platform for compliance for example), Microsoft Purview
><https://learn.microsoft.com/en-us/samples/microsoft/purview-adb-lineage-solution-accelerator/azure-databricks-to-purview-lineage-connector/>
>(Governance,
>…), Astro <https://www.astronomer.io/why-openlineage/> (data
>observability), Amundsen
><https://www.amundsen.io/amundsen/databuilder/#openlineagetablelineageextractor>.
>AWS recently blogged about using OpenLineage in the AWS ecosystem
><https://aws.amazon.com/blogs/big-data/automate-data-lineage-on-amazon-mwaa-with-openlineage/>.
>Other projects are at various levels of progress.
>
>On the producer side, there is support for open source projects like
>Airflow, dbt, Spark, Flink, GreatExpectations and proprietary warehouses
>like Snowflake
><https://github.com/Snowflake-Labs/OpenLineage-AccessHistory-Setup/blob/main/README.md>,
>BigQuery, Redshift
><https://aws.amazon.com/blogs/big-data/automate-data-lineage-on-amazon-mwaa-with-openlineage/>
>through
>API integration or SQL parsing.
>
>Examples of users talking about their usage of OpenLineage can be found on
>the Openlineage blog
><https://openlineage.io/blog/openlineage-at-northwestern-mutual/>..
>
>This integration will also stimulate the continued growth of the
>OpenLineage ecosystem and create more value for Airflow users.

Reply via email to