Understood. Thank you very much for the detailed description

It indeed helps us to better understand the contribution process and assess 
what route our provider would take (either go the Community provider way, or 
just release it as part of our project and maybe add it to the airflow 
ecosystem page).

Kind regards,
Andy

-----Original Message-----
From: Jarek Potiuk <[email protected]> 
Sent: Thursday, March 31, 2022 2:32 PM
To: [email protected]
Subject: Re: Question about contributing new Provider

⚠ External Email

Technicalities (I put a longer description in order you can get your
expectations):

What you did is good. Starting discussion on Devlist is a good start.
Eventually - like everything in the Apache Project, if code change is small and 
can be approved by one or two committers, any bigger change needs to be:

* discussed in devlist (you already started it)
* consensus seems to be reached
* aither consensus is that it can be done by lazy-consensus (if we seem to all 
agree) or voting
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.apache.org%2Ffoundation%2Fvoting.html&amp;data=04%7C01%7Candonova%40vmware.com%7C82043ca33027433443d008da130a225e%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637843231502596703%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=r9%2FHRQasX5KAzbjW7J8xAXrZu9ObUn1104LVkG%2Fs6zQ%3D&amp;reserved=0

Note that (see the rules of voting) code modifications like that can be simply 
vetoed (by a single commiter) during the voting process (and one justified vote 
is enough to block It) so you should be rather convinced that you will not get 
anyone's veto.
It's your job to guide the discussion in the way to reach the consensus and 
start and proceed with voting when you think the time is right. Note that there 
are certain communication rules to follow - especially about making sure that 
everyone can participate, so such discussions tend to take quite some time 
(weeks or months).

More information and general "contribution" guide can be found here:
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fapache%2Fairflow%2Fblob%2Fmain%2FCONTRIBUTING.rst&amp;data=04%7C01%7Candonova%40vmware.com%7C82043ca33027433443d008da130a225e%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637843231502596703%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=apa9cQB9OUz44jMIPLSRpw615uwRudm9UD1GYtew3g4%3D&amp;reserved=0

Your case:

I think we do not have precise rules, however what is important is that the 
community (as a whole) commits to maintain the code. Code contribution is not a 
one-way-street, code is actually more often liability than asset.
We also need to make sure that someone in the community can test it, knows how 
to do it and is able to validate that any fixes and changes there can be 
maintained. And the bigger the contribution, and the less "popular" a given 
provider is, the less chance we would like to make it part of the community.

I think - but this is my personal opinion and possibly others will chime in - 
we move away from the mode when we accept new providers "by default". We even 
have some discussions on whether we should not give some providers back to the 
people who are "owning" services so that they can maintain it.

I personally think before the provider gets accepted, we need to see that the 
service is used and popular. And certainly Airflow cannot (and should not be) 
used as a driver of that popularity. And this is perfectly fine if you maintain 
your provider outside of Airflow "Community Managed Providers". We have 
dedicated chapter for that in our "ecosystem" page - 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fairflow.apache.org%2Fecosystem%2F%23third-party-airflow-plugins-and-providers&amp;data=04%7C01%7Candonova%40vmware.com%7C82043ca33027433443d008da130a225e%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637843231502596703%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=WXkL%2B78AcZXSmI8Zs%2FuS7V7oV%2FBMAi4oSO4htfZhgaI%3D&amp;reserved=0.
So anyone who develops the provider is free to publish, release it and even 
make a PR to link to it from the Ecosystem Page.

There are various Pros and Cons of being a "Community Provider"

Pros:
* It is released with the "ASF Stamp of Approval" and guarantees it follows the 
rules
* It comes as an "apache-airflow-provider" package.
* It comes as an "extra" of Airflow (though we might get rid of the extras in 
the future)
* It gets tested automatically in CI with the latest version of Airflow

Cons:

* There is a certain burden and process that all community providers must 
follow (documentation, testing) - we are also introducing automated system 
testing finally (see 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FAIRFLOW%2FAIP-47%2BNew%2Bdesign%2Bof%2BAirflow%2BSystem%2BTests&amp;data=04%7C01%7Candonova%40vmware.com%7C82043ca33027433443d008da130a225e%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637843231502596703%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&amp;sdata=SWMUJZ3jF1DdIZnPTDi3yl1ivVUvFfB07MrJaA%2Feoic%3D&amp;reserved=0).
So I would say all future providers (if they are using some external
service) will have to have system tests implemented. That means few
things: first of all, there should be systems tests, secondly, there should be 
some way (credits, free accounts with enough capacity to run regular testing in 
an automated way - donated to Airflow in order to be able to support a 
provider). We have not yet discussed the last point (credits and system tests 
being condition) but this seems like we are heading towards - with Amazon and 
Google providers leading the effort of implementing system tests (we already 
got credits and commitments from both).

* We release our community providers in regular cadence (monthly) - we only 
very rarely break the cycle and if you want to get better control over release 
schedule, you will not be able to have it.

* Code committed to providers have to go through a regular process of review 
and approval and for many services it might seem slow. It is not uncommon for 
PR to take weeks to go through a review and approval process. And (except 
nagging people) you have zero influence on the review process. Airflow is 
maintained by committers - individuals (often volunteers) so they review the 
code and approve it when they can / have time/ don't have other more priority 
tasks. Becoming a community provider means that you accept this and accept that 
you have zero control over that as an organisation. Only individuals can become 
committers, so even if someone from your company will become one, at some point 
in time, the commitership goes with that person if he or she changes jobs. So 
you have to accept the fact that you, as an organization, have no real 
guaranteed influence on that. And if the service is not really popular, you 
risk that none of the committers is interested or even has capacity and 
knowledge to be able to review the code.

So you have to consider yourself in the first place if you really want to 
become a community provider, or whether it is better that you release it and 
maintain it on your own.

I hope others will chime in. I personally do not know anything about VDK and I 
do not know if any of the committers know. If they don't ths is a rather strong 
indication that you should come "your own" route.
But if you decide to try the "community" route, this will be your job to 
convince the committers this is a good idea and make sure you don't have strong 
"vetoes" after the discussions.

J.




On Thu, Mar 31, 2022 at 12:43 PM Andon Andonov <[email protected]> wrote:
>
> Hello,
>
>
>
> We are developing a data engineering framework, called Versatile Data Kit, 
> which allows data engineers to develop, deploy and manage data processing 
> workloads which we refer to as ``Data Jobs``. These jobs allow data engineers 
> to implement automated pull ingestion (E in ELT) and batch data 
> transformation (T in ELT) into a database.
>
>
>
> Currently, we are working on a Provider to integrate our project with Airflow 
> and would like to contribute it upstream at some point in time.
>
> The architecture specification for the Provider can be seen here.
>
>
>
> According to this FAQ, sub-section “Can I contribute my own provider to 
> Apache Airflow?”, we need to check if the community would accept the 
> contribution. However, from what is written in the paragraph, it makes it a 
> bit difficult (at least to me) to understand, if there is a specific process 
> in place that needs to be followed, or should a proposal be put for a vote?
>
>
>
> I have probably missed some piece of documentation explaining it, so would 
> appreciate any help or tips, pointing me where to look at.
>
>
>
> Kind Regards,
>
> Andon

________________________________

⚠ External Email: This email originated from outside of the organization. Do 
not click links or open attachments unless you recognize the sender.

Reply via email to