Two fast/easy ways to find them 1. https://registry.astronomer.io/ 2. Using the new classifier https://pypi.org/search/?o=&c=Framework+%3A%3A+Apache+Airflow+%3A%3A+Provider
On 25 April 2022 18:08:49 BST, "Ferruzzi, Dennis" <ferru...@amazon.com.INVALID> wrote: >I still think that easy inclusion with a defined pruning process is best, but >it's looking like that is the minority opinion. In which case, IFF we are >going to be keeping them separate then I definitely agree that there needs to >be a fast/easy/convenient way to find them. > > >________________________________________ >From: Jarek Potiuk <ja...@potiuk.com> >Sent: Monday, April 25, 2022 7:17 AM >To: dev@airflow.apache.org >Subject: RE: [EXTERNAL][DISCUSS] Approach for new providers of the community > >CAUTION: This email originated from outside of the organization. Do not click >links or open attachments unless you can confirm the sender and know the >content is safe. > > > >Just to come back to it (please everyone a little patience - I think >some people have not chimed in yet due to 2.3.0 "focus" so this >discussion might take a little more time. > >My current thinking on it so far: > >* I am not really in the camp of "lets not add any more providers at >all" and also not in the "let's accept all that are good quality code >providers". I think there are a few providers which "after fulfilling >all the criteria" could be added - mostly open-source standards, >generic, established technologies - but it should be rather limited >and rare event. > >* when there is a proprietary service which has not too broad reach >and it's not likely that we will have some committers who will be >maintaining it - becauyse they are users - the default option should >be to make a standalone per-service providers. the difficulty here is >to set the right "non-quality" criteria - but I think we really want >to limit any new code to maintain. Here maybe we can have some more >concrete criteria proposed - so that we do not have to vote >individually on each proposed providers - and so that those who want >to propose a provider could check themselves by reading the criteria, >what's best for them. > >* we might improve our "providers" list at the "ecosystem" to make >providers stand out a bit more (maybe simply put them on top and make >a clearly visible section). We are not going to maintain and keep the >nice "registry" similar to Astronomer's one (we could even actually >make the link to the Astronomer registry more prominent as the way to >"search" for providers on our Ecosystem Page. We could also add a link >to Pypi with the "aifrflow provider" classifier at the ecosystem page >as another way of searching for providers. All that is perfectly fine, >I think with the ASF Policies and spirit. And it will be good for >discovery. > >WDYT? > >J. > >On Mon, Apr 18, 2022 at 3:59 PM Samhita Alla <samh...@union.ai> wrote: >> >> Hello! >> >> The reason behind submitting Flyte provider to the Airflow repository is >> because we felt it'd be effortless for the Airflow users to use the >> integration. Moreover, since it'd be under the umbrella of Airflow, we >> estimated that the Airflow users would not hesitate from using the provider. >> >> We could definitely have this as a standalone provider, but the >> easy-to-get-started incentive of Airflow providers seemed like a better >> option. >> >> If there's a sophisticated plan in place for having standalone providers in >> PyPI, we're up for it. >> >> Thanks, >> Samhita >> >> On Wed, Apr 13, 2022 at 9:58 PM Alex Ott <alex...@gmail.com> wrote: >>> >>> Hello all >>> >>> I want to try to explain a motivation behind submission of the Delta >>> Sharing provider: >>> >>> Let me start with the fact that the original issue was created against >>> Airflow repository, and it was accepted as potential new functionality. And >>> discussion about new providers has started almost on the day when PR was >>> submitted :-) >>> Delta Sharing is the OSS project under umbrella of the Linux Foundation >>> that defines a protocol and reference implementations. It was started by >>> the Databricks, but has other contributors as well - that's why it wasn't >>> pushed into a Databricks provider, as it's not specific to Databricks. >>> Another thought about submitting it as a separate provider was to get more >>> people interested in this functionality and build additional integrations >>> on top of it. >>> Another important aspect of having providers in the Airflow repository is >>> that they are tested together with changes in the core of the Airflow. >>> >>> I completely understand the concerns about more maintenance effort, but my >>> personal point of view (about it below) is similar to Rafal's & John's - if >>> there are well defined criteria & plans for decommissioning or something >>> like, then providers could be part of the releases, etc. >>> >>> I just want to add that although I'm employed by Databricks, I'm not a part >>> of the development team - I'm in the field team who work with customers, >>> sees how they are using different tools, seeing pain points, etc. Most of >>> work so far was done on my own time - I'm doing some coordination, but most >>> of new functionality (AAD tokens support, Repos, Databricks SQL operators, >>> etc.) is coming from seeing customers using Airflow together with >>> Databricks. >>> >>> >>> On Mon, Apr 11, 2022 at 9:14 PM Rafal Biegacz >>> <rafalbieg...@google.com.invalid> wrote: >>>> >>>> Hi, >>>> >>>> I think that we will need to find some middle ground here - we are trying >>>> to optimize in many dimensions (Jarek mentioned 3 of them). Maybe I would >>>> also add another 4th dimension - Airflow Service Provider, :). >>>> >>>> Airflow users - whether they do self-managed Airflow or use "managed >>>> Airflow" provided by others are beneficients of the fact that Airflow has >>>> a decent portfolio of providers. >>>> It's not only a guarantee that these providers should work fine and they >>>> meet Airflow coding/testing standards. It's also a kind of guarantee, that >>>> once they start using Airflow >>>> with providers backed by the Airflow community they won't be on their own >>>> when it comes to troubleshooting/updating/etc. It will be much easier for >>>> them to convince their companies to use Airflow for production use cases >>>> as the Airflow platform (core + providers) is tested/maintained by the >>>> Airflow community. >>>> >>>> Keeping providers within the Airflow repository generates integration and >>>> maintenance work on the Airflow community side. On the other hand, if this >>>> work is not done within the community then this effort would need to be >>>> done by all users to a certain extent. So from this perspective it's more >>>> optimal for the community to do it so users can use off-the-shelf Airflow >>>> for the majority of their use cases >>>> >>>> When it comes to accepting new providers - I like John's suggestions: >>>> a) well defined standard that needs to be met by providers - passing the >>>> "provider qualification" would be some effort so each service provider >>>> would need to decide if it wouldn't be easier to maintain their provider >>>> on their own. >>>> b) well define lifecycle for providers - which would allow to identify >>>> providers that are obsolete/not popular any more and make them obsolete. >>>> >>>> Regards, Rafal. >>>> >>>> >>>> >>>> >>>> >>>> On Mon, Apr 11, 2022 at 6:47 PM Jarek Potiuk <ja...@potiuk.com> wrote: >>>>> >>>>> I've been thinking about it - to make up my mind a little. The good thing >>>>> for me is that I have no strong opinion and I can rather easily see (or >>>>> so I think) of both sides. >>>>> >>>>> TL;DR; I think we need an explanation from the "Service Providers" - what >>>>> they want to achieve by contributing providers to the community and see >>>>> if we can achieve similar results differently. >>>>> >>>>> >>>>> Obviously I am a bit biased from the maintainer point of view, but since >>>>> I cooperate with various stakeholders i spoke to some of them just see >>>>> their point of view and this is what I got: >>>>> >>>>> Seems that we have really three types of stakeholders that are really >>>>> interested in "providers": >>>>> >>>>> 1) "Maintainers" - those who mostly maintain Airflow and have to take >>>>> care about its future and development and "grand vision" of where we want >>>>> to be in few years >>>>> 2) "Users" - those who use Airflow and integration with the Service >>>>> Provider >>>>> 3) "Service providers" - those who run the services that Airflow >>>>> integrates with - via providers (that group might also contain those >>>>> stakeholders that run Airflow "as a service") >>>>> >>>>> Let me see it from all the different POVs: >>>>> >>>>> >>>>> From 1) Maintainer POV >>>>> >>>>> More providers mean slower growth of the platform overall as the more >>>>> providers we add and manage as a community, the less time we can spend on >>>>> improving Airflow as a core. >>>>> Also the vision I think we all share is that Airflow is not a "standalone >>>>> orchestrator" any more - due to its popularity, reach and power, it >>>>> became an "orchestrating platform" and this is the vision that keeps us - >>>>> maintainers - busy. >>>>> >>>>> Over the last 2 years pretty much everything we do - make Airflow "more >>>>> extensible". You can add custom "secrets managers". "timetables", >>>>> "defferers" etc. "Customizability" is now built-in and "theme" of being a >>>>> modern platform. >>>>> Hell - we even recently added "Airflow Provider" trove classified in >>>>> PyPI: >>>>> https://pypi.org/search/?c=Framework+%3A%3A+Apache+Airflow+%3A%3A+Provider >>>>> and the main justification in the discussion was that we expect MORE >>>>> 3rd-parties to use it, rather than relying on "apache-airflow-provider" >>>>> package name. >>>>> So from maintainer POV - having 3rd-party providers as "extensions" to >>>>> Airlow makes perfect sense and is the way to go. >>>>> >>>>> >>>>> From 2) User POV >>>>> >>>>> Users want to use Airflow with all the integrations they use together. >>>>> But only with those that they actually use. Similarly as maintainers - >>>>> supporting and needing all 70+ providers is something they usually do not >>>>> REALLY care about. >>>>> They literally care about the few providers they use. We even taught the >>>>> users that they can upgrade and install providers separately from the >>>>> core. So they already know they can mix and match Airflow + Providers to >>>>> get what they want. >>>>> >>>>> And they do use it - even if they use our image, the image only contains >>>>> a handful of the providers and when they need to install >>>>> new providers - they just install it from PyPI. And for that the >>>>> difference of "community providers" vs. 3rd party providers - except the >>>>> stamp of approval of the ASF, is not really visible. >>>>> Surely they can use [extras] to install the providers but that is just a >>>>> convenience and is definitely not needed by the users. >>>>> For example when they build a custom image they usually extend Airflow >>>>> and simply 'pip install <PROVIDER>' >>>>> As long as someone makes sure that the provider can be installed on >>>>> certain versions of Airflow - it does not matter. >>>>> >>>>> Also from the users perspective Airflow became "popular" enough that it >>>>> no longer needed "more integrations" to be more "appealing" for the users. >>>>> They already use Airflow. They like it (hopefully) and the fact that this >>>>> or that provider is part of the community makes no difference any more. >>>>> >>>>> >>>>> From 3) "Service providers" POV >>>>> >>>>> Here I am not sure. It's not very clear what service providers get from >>>>> being part of the "community providers". >>>>> >>>>> I hear that some big service (cloud providers) find it cool that we give >>>>> it the ASF "Stamp of Approval". And they are willing to pay the price of >>>>> a slower merge process, dependence on the community and following strict >>>>> rules of the ASF. >>>>> And the community also is happy to pay the price of maintaining those >>>>> (including the dependencies which Elad mention) to make sure that all the >>>>> community providers work in concert - because those "Services" are hugely >>>>> popular and we "want" as a community to invest there. >>>>> But maintaining those deps in sync is a huge effort and it will become >>>>> even worse - the more we add. On the other hand for 3rd party providers >>>>> it will be EASIER to keep up. >>>>> They don't have to care about all the community providers to work >>>>> together, they can choose a subset. And when they release their libraries >>>>> they can take care about making sure the dependencies are not broken. >>>>> >>>>> There are other "drawbacks" for being a "community" provider. For example >>>>> we have the rule that we support the min-Airflow version for providers >>>>> from the community 12 months after Airflow release. >>>>> This means that users of Airflow 2.1 will not receive updates for the >>>>> providers after 21st of May. This is the price to pay for >>>>> community-managed providers. We will not release bug fixes in providers >>>>> or changes for Airflow 2.1 users after 21st of May. >>>>> But if you manage your own provider - you still can support 2.0 or even >>>>> 1.10 if you want. >>>>> >>>>> I cannot really see why a Service Provider would want to become an >>>>> Airflow Community Provider. >>>>> >>>>> And I am not really sure what Flyte, Delta Sharing, Versatile Data Kit, >>>>> and Cloudera people think and why they think this is the best choice. >>>>> >>>>> I think when we understand what the "Service Providers" want to achieve >>>>> this way, maybe we will be able to come up with some middle ground and at >>>>> least set some rules when it makes sense and when it does not make sense. >>>>> Maybe 'contributing provider' is the way to achieve something else and we >>>>> simply do not realize that in the new "Airflow as a Platform" world, all >>>>> the stakeholders can achieve very similar results using different >>>>> approaches. >>>>> >>>>> * For example we could think about how we can make it easier for Airflow >>>>> users to discover and install their providers - without actually taking >>>>> ownership of the code by the community. >>>>> * Or maybe we could introduce a tool to make a 3rd-party provider pass a >>>>> "compliance check" as suggested above >>>>> * Or maybe we could introduce a "breeze" extension to be able to install >>>>> and test provider in the "latest airflow" so that the service providers >>>>> could check it before we even release airflow and dependencies >>>>> >>>>> So what I think we really need - Alex, Samhita, Andon, Philippe (I >>>>> think) - could you tell us (every one of you separately) - what are your >>>>> goals when you came up with the "contribute the new provider" idea? >>>>> >>>>> J. >>>>> >>>>> On Wed, Apr 6, 2022 at 11:51 PM Elad Kalif <elad...@apache.org> wrote: >>>>>> >>>>>> Ash what is your recommendation for the users should we follow your >>>>>> suggestion? >>>>>> This means that the big big big joy of using airflow constraints and >>>>>> getting a working environment with all required providers will be no >>>>>> more. >>>>>> So users will get a working "Vanilla" Airflow and then will need to >>>>>> figure out how they are going to tackle independent providers that may >>>>>> not be able to coexist one with another. >>>>>> This means that users will need to create their own constraints >>>>>> mechanism and maintain it. >>>>>> >>>>>> From my perspective this increases the complexity of getting Airflow to >>>>>> be production ready. >>>>>> I know that we say providers vs core but I think that from users >>>>>> perspective providers are an integral part of Airflow. >>>>>> Having the best scheduler and the best UI is not enough. Providers are a >>>>>> crucial part that complete the set. >>>>>> >>>>>> Maybe eventually there should be something like a provider store where >>>>>> there can be official providers and 3rd party providers. >>>>>> >>>>>> This may be even greater discussion than what we are having here. It >>>>>> feels more like Airflow as a product vs Airflow as an ecosystem. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Apr 7, 2022 at 12:27 AM Collin McNulty >>>>>> <col...@astronomer.io.invalid> wrote: >>>>>>> >>>>>>> I agree with Ash and Tomasz. If it were not for the history, I think in >>>>>>> an ideal world even the providers currently part of the Airflow repo >>>>>>> would be managed separately. (I'm not actually suggesting removing any >>>>>>> providers.) I don't think it's a matter of gatekeeping, I just think >>>>>>> it's actually kind of odd to have providers in the same repo as core >>>>>>> Airflow, and it increases confusion about Airflow versions vs provider >>>>>>> package versions. >>>>>>> >>>>>>> Collin McNulty >>>>>>> >>>>>>> On Wed, Apr 6, 2022 at 4:21 PM Tomasz Urbaszek <turbas...@apache.org> >>>>>>> wrote: >>>>>>>> >>>>>>>> I’m leaning toward Ash approach. Having providers maintaining the >>>>>>>> packages may streamline many aspects for providers/companies. >>>>>>>> >>>>>>>> 1. They are owners so they can merge and release whenever they need. >>>>>>>> 2. It’s easier for them to add E2E tests and manage the resources >>>>>>>> needed for running them. >>>>>>>> 3. The development of the package can be incorporated into their >>>>>>>> company processes - not every company is used to OSS mode. >>>>>>>> >>>>>>>> Whatever way we go - we should have some basics guidelines and >>>>>>>> requirements (for example to brand a provider as “recommended by >>>>>>>> community” or something). >>>>>>>> >>>>>>>> Cheers, >>>>>>>> Tomsk >>> >>> >>> >>> -- >>> With best wishes, Alex Ott >>> http://alexott.net/ >>> Twitter: alexott_en (English), alexott (Russian)