+1 to all points that Jarek raised. Plus some rather technical/implementation 
questions:

The Fabric includes a lot of tools and features and also advertises a managed 
Airflow setup (with a bit outdated Airflow version 2.6.3). If a new/additional 
provider package would be added what kind of operators, hooks, plugins etc. 
would be expected? There are a lot of connectors also existing in Fabric, how 
do these connectors relate to operators in Airflow and how would it integrate?
From the Microsoft docs I do not fully understand if it is rather a deeper 
integration of Airflow in Fabric or a deeper integration of Fabric into Airflow 
or how the interplay would look like.

Sent from Outlook for iOS<https://aka.ms/o0ukef>
________________________________
From: Jarek Potiuk <ja...@potiuk.com>
Sent: Wednesday, July 24, 2024 11:15:34 PM
To: dev@airflow.apache.org <dev@airflow.apache.org>
Subject: Re: [DISCUSS] Add the Microsoft Fabric Provider in Apache Airflow

I think an important question here is maintainability and the question is
whether Microsoft (which I assume is behind this proposal - more or less
directly). Could you please confirm and explain your affiliation with
Microsoft (I think a big part of the community here is that we are
transparent and open about our affiliations and who is behind the code
contributed here, so that we have clarity also about long-term
maintainability of the code.

I understand you work as a software developer for Microsoft? (from the
Github profile). And let me be very clear - it's not something directed to
you - but such provider discussion is not really (or should not be) just
between community and a single developer submitting the code but between
community and a team that will commit and prove the commitment that they
will maintain it and engage back in the community.

Could you (or someone else from Microsoft) please explain what is the role
of Microsoft in leading that integration and future maintenance? This is
something that for example Teradata team explained, promised, and - they
keep their promise actually.

Ideally we would love - whoever from Microsoft leadership is behind that
that (if that's the case) to explain what is the role they are going to
take in the future in maintenance, also whether they have plans to develop
and maintain system test dashboards for the current azure provider, and
whether they have plans (after that) to build system dashboard as part of
contributing the new provider (for me this is absolute prerequisite in case
of such a provider - it MUST have system dashboard and the system dashboard
MUST be maintained and run by the stakeholder who is interested in having
the provider in the community.

Let me explain why I am asking.

So far we have seen 0 activity from Microsoft Fabric team that has Airflow
as a service  (compared to Amazon, Google, Astronomer - who all here are
actively contributing to Airflow).

Amazon, Google, Astronomer teams do not only solve issues and maintain code
for their respective providers but participate actively in developing core
Airflow. All three of them (and Teradata team) - build and maintain system
tests dashboards
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fairflow.apache.org%2Fecosystem%2F%23airflow-provider-system-test-dashboards&data=05%7C02%7CJens.Scheffler%40de.bosch.com%7Ca014ae8cf8664d83ee7408dcac25cf1b%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638574525586258366%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=CAph1JrLB%2FLCOqO%2FB28Od5%2BInlOisjDJCV2eZAB7Zt4%3D&reserved=0<https://airflow.apache.org/ecosystem/#airflow-provider-system-test-dashboards>
- where they contribute multiple 100s of system tests to their respective
providers and  they run the system tests and maintain them. This means that
they take responsibility to monitor and fix issues in the providers of
theirs and having the dashboards showing status is not only good for them
but also good for us because we know - when we release that the provider to
be released is probably good. And we are even talking about next steps -
machine readable data from the dashboard that we will be able to aggregate
and have an overview of all those "big and important" providers.

We call it - "mixed governance" approach - where the code is contributed to
the community and it is developed according to the Apache Way and rules of
the ASF. While testing and maintenance is largely led by the respective
stakeholder teams - who contribute a lot of engineering effort - back. They
do not expect to drop the code so that "community" will keep maintaining
it. The leadership from Google, Astronomer, Amazon are all deeply involved
with the community (organizing Airflow Summit, taking active part in
Airflow Dev calls and discussions and planning.

Now. We see precisely zero of such collaboration from Microsoft. Despite us
nagging and reaching out in multiple ways.

Seems that the Azure provider is not "taken care" of by Microsoft. (and
here I might simply not be aware of some people contributing and being
supported and sponsored by Microsoft) so I might be wrong here. But I do
not see people from Microsoft (or paid by Microsoft) who would actively fix
bugs, develop system tests and monitor their "greenness". And unlike in the
case of Amazon, Google, Astronomer, Teradata I do not see anyone from
Microsoft taking care about issues raised in any of our channels
(issues/discussions/slack channels). The only commit I can associate with
Microsoft is your 
https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fairflow%2Fpull%2F35091&data=05%7C02%7CJens.Scheffler%40de.bosch.com%7Ca014ae8cf8664d83ee7408dcac25cf1b%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638574525586268871%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=8JVQYvjjAPgMypKGZw829qKMaS4mctQY0vUxFLd18wE%3D&reserved=0<https://github.com/apache/airflow/pull/35091>
 where a new
operator has been added.

Again - I might not be aware of some Microsoft's efforts here, but If I am
right, and nothing will change here - I assume we might expect the same
from the new provider - that the code will be dropped and the whole burden
of maintaining it will be on the maintainers of Airflow and the community.
And if we see such an approach - I think we all here in the community agree
that the answer is "no thanks, release and maintain your own provider,
please". We are not able to take more burden to support something that
could and should be supported by the stakeholder who wants to make sure
things are working well for their services.

I think if we are going to seriously discuss this provider - we need to be
absolutely sure that there is commitment (and we should see it) from the
big stakeholder that is kind of Missing In Action here - while all the
others are fully present and contributing back.

This is at least my personal opinion - taken from years of collaboration
with all those stakeholders and seeing win-win-win (users - maintinares -
stakeholders) when such cooperation works.

Can you please bring someone from Microsoft (if you cannot speak yourself)
to explain what their plans are in this regard ?

J





On Wed, Jul 24, 2024 at 4:24 AM ambika garg <ambikagarg1...@gmail.com>
wrote:

> Hi Apache Airflow Community,
>
> I hope this message finds you well.
>
> TL; DR; I am writing to propose the addition of a new provider to Apache
> Airflow for Microsoft Fabric 
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.microsoft.com%2Fmicrosoft-fabric&data=05%7C02%7CJens.Scheffler%40de.bosch.com%7Ca014ae8cf8664d83ee7408dcac25cf1b%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638574525586275162%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=Ow6oHDUj3B1rMdjUcpo5SzKEs8obYWprU1JXHKXrPnc%3D&reserved=0<https://www.microsoft.com/microsoft-fabric>>,
> an end-to-end, unified analytics and data platform. This integration will
> streamline workflow management and offer robust capabilities for Airflow
> users leveraging Microsoft Fabric's comprehensive services.
>
> *What is Microsoft Fabric?*
>
> Microsoft Fabric is an end-to-end, unified analytics and data platform
> designed for enterprises seeking a cohesive solution. Operating on a
> Software as a Service (SaaS) model, it provides a suite of services
> including Data Engineering, Data Factory, Data Science, Real-Time
> Analytics, Data Warehouse, and Databases in a single, integrated ecosystem,
> eliminating the need for disparate services from multiple vendors. Learn
> more about Microsoft Fabric
> <https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Ffabric%2Foverview%2F&data=05%7C02%7CJens.Scheffler%40de.bosch.com%7Ca014ae8cf8664d83ee7408dcac25cf1b%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638574525586280099%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=KwUeIoj0FQh%2FnXjQwUn1Aq%2BXptTIzMD9qGWEbXsObTc%3D&reserved=0<https://docs.microsoft.com/en-us/fabric/overview/>>.
>
> *Why should Apache Airflow accept Microsoft Fabric as a provider?*
>
> 1.        *Leverage the Microsoft Fabric items:  *By integrating Microsoft
> Fabric as a provider in Apache Airflow, we can leverage its comprehensive
> suite of services such as Fabric notebooks, pipelines, warehouses etc. to
> enhance workflow management for a variety of use cases.
>
> 2.        *Unified Platform*: Microsoft Fabric offers a comprehensive set
> of analytics experiences designed to work together seamlessly. Users don’t
> need to assemble and manage disparate services from multiple vendors,
> leading to more robust, simplified, and reliable workflows for those who
> already rely on Airflow for orchestration.
>
> 3.        *SaaS Model Efficiency*: As a SaaS platform, Microsoft Fabric
> offers scalability, maintenance, and updates handled by Microsoft, reducing
> the operational burden on users. Airflow users can leverage these
> efficiencies while orchestrating workflows that involve Fabric services.
>
> 4.        *Fabric is lake*-*centric and open: *Microsoft Fabric's
> lake-centric design addresses the complexity and messiness of traditional
> data lakes. By integrating Fabric with Airflow, users can leverage OneLake,
> a multi-cloud data lake, to simplify data management and reduce data
> duplication.
>
> 5.        *Market Demand*: As enterprises increasingly adopt Microsoft
> Fabric for their analytics needs, there will be growing demand for seamless
> integration with existing and well-established tools like Apache Airflow.
>
> What do you guys think?
>
> Best Regards,
> Ambika Garg
>

Reply via email to