Technicalities (I put a longer description in order you can get your
expectations):

What you did is good. Starting discussion on Devlist is a good start.
Eventually - like everything in the Apache Project, if code change is
small and can be approved by one or two committers, any bigger change
needs to be:

* discussed in devlist (you already started it)
* consensus seems to be reached
* aither consensus is that it can be done by lazy-consensus (if we
seem to all agree) or voting
https://www.apache.org/foundation/voting.html

Note that (see the rules of voting) code modifications like that can
be simply vetoed (by a single commiter) during the voting process (and
one justified vote is enough to block It) so you should be rather
convinced that you will not get anyone's veto.
It's your job to guide the discussion in the way to reach the
consensus and start and proceed with voting when you think the time is
right. Note that there are certain communication rules to follow -
especially about making sure that everyone can participate, so such
discussions tend to take quite some time (weeks or months).

More information and general "contribution" guide can be found here:
https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst

Your case:

I think we do not have precise rules, however what is important is
that the community (as a whole) commits to maintain the code. Code
contribution is not a one-way-street, code is actually more often
liability than asset.
We also need to make sure that someone in the community can test it,
knows how to do it and is able to validate that any fixes and changes
there can be maintained. And the bigger the contribution, and the less
"popular" a given provider is, the less chance we would like to make
it part of the community.

I think - but this is my personal opinion and possibly others will
chime in - we move away from the mode when we accept new providers "by
default". We even have some discussions on whether we should not give
some providers back to the people who are "owning" services so that
they can maintain it.

I personally think before the provider gets accepted, we need to see
that the service is used and popular. And certainly Airflow cannot
(and should not be) used as a driver of that popularity. And this is
perfectly fine if you maintain your provider outside of Airflow
"Community Managed Providers". We have dedicated chapter for that in
our "ecosystem" page -
https://airflow.apache.org/ecosystem/#third-party-airflow-plugins-and-providers.
So anyone who develops the provider is free to publish, release it and
even make a PR to link to it from the Ecosystem Page.

There are various Pros and Cons of being a "Community Provider"

Pros:
* It is released with the "ASF Stamp of Approval" and guarantees it
follows the rules
* It comes as an "apache-airflow-provider" package.
* It comes as an "extra" of Airflow (though we might get rid of the
extras in the future)
* It gets tested automatically in CI with the latest version of Airflow

Cons:

* There is a certain burden and process that all community providers
must follow (documentation, testing) - we are also introducing
automated system testing finally (see
https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-47+New+design+of+Airflow+System+Tests).
So I would say all future providers (if they are using some external
service) will have to have system tests implemented. That means few
things: first of all, there should be systems tests, secondly, there
should be some way (credits, free accounts with enough capacity to run
regular testing in an automated way - donated to Airflow in order to
be able to support a provider). We have not yet discussed the last
point (credits and system tests being condition) but this seems like
we are heading towards - with Amazon and Google providers leading the
effort of implementing system tests (we already got credits and
commitments from both).

* We release our community providers in regular cadence (monthly) - we
only very rarely break the cycle and if you want to get better control
over release schedule, you will not be able to have it.

* Code committed to providers have to go through a regular process of
review and approval and for many services it might seem slow. It is
not uncommon for PR to take weeks to go through a review and approval
process. And (except nagging people) you have zero influence on the
review process. Airflow is maintained by committers - individuals
(often volunteers) so they review the code and approve it when they
can / have time/ don't have other more priority tasks. Becoming a
community provider means that you accept this and accept that you have
zero control over that as an organisation. Only individuals can become
committers, so even if someone from your company will become one, at
some point in time, the commitership goes with that person if he or
she changes jobs. So you have to accept the fact that you, as an
organization, have no real guaranteed influence on that. And if the
service is not really popular, you risk that none of the committers is
interested or even has capacity and knowledge to be able to review the
code.

So you have to consider yourself in the first place if you really want
to become a community provider, or whether it is better that you
release it and maintain it on your own.

I hope others will chime in. I personally do not know anything about
VDK and I do not know if any of the committers know. If they don't ths
is a rather strong indication that you should come "your own" route.
But if you decide to try the "community" route, this will be your job
to convince the committers this is a good idea and make sure you don't
have strong "vetoes" after the discussions.

J.




On Thu, Mar 31, 2022 at 12:43 PM Andon Andonov <[email protected]> wrote:
>
> Hello,
>
>
>
> We are developing a data engineering framework, called Versatile Data Kit, 
> which allows data engineers to develop, deploy and manage data processing 
> workloads which we refer to as ``Data Jobs``. These jobs allow data engineers 
> to implement automated pull ingestion (E in ELT) and batch data 
> transformation (T in ELT) into a database.
>
>
>
> Currently, we are working on a Provider to integrate our project with Airflow 
> and would like to contribute it upstream at some point in time.
>
> The architecture specification for the Provider can be seen here.
>
>
>
> According to this FAQ, sub-section “Can I contribute my own provider to 
> Apache Airflow?”, we need to check if the community would accept the 
> contribution. However, from what is written in the paragraph, it makes it a 
> bit difficult (at least to me) to understand, if there is a specific process 
> in place that needs to be followed, or should a proposal be put for a vote?
>
>
>
> I have probably missed some piece of documentation explaining it, so would 
> appreciate any help or tips, pointing me where to look at.
>
>
>
> Kind Regards,
>
> Andon

Reply via email to