Two fast/easy ways to find them

1. https://registry.astronomer.io/
2. Using the new classifier 
https://pypi.org/search/?o=&c=Framework+%3A%3A+Apache+Airflow+%3A%3A+Provider

On 25 April 2022 18:08:49 BST, "Ferruzzi, Dennis" <ferru...@amazon.com.INVALID> 
wrote:
>I still think that easy inclusion with a defined pruning process is best, but 
>it's looking like that is the minority opinion.  In which case, IFF we are 
>going to be keeping them separate then I definitely agree that there needs to 
>be a fast/easy/convenient way to find them.
>
>
>________________________________________
>From: Jarek Potiuk <ja...@potiuk.com>
>Sent: Monday, April 25, 2022 7:17 AM
>To: dev@airflow.apache.org
>Subject: RE: [EXTERNAL][DISCUSS] Approach for new providers of the community
>
>CAUTION: This email originated from outside of the organization. Do not click 
>links or open attachments unless you can confirm the sender and know the 
>content is safe.
>
>
>
>Just to come back to it (please everyone a little patience - I think
>some people have not chimed in yet due to 2.3.0 "focus" so this
>discussion might take a little more time.
>
>My current thinking on it so far:
>
>* I am not really in the camp of "lets not add any more providers at
>all" and also not in the "let's accept all that are good quality code
>providers". I think there are a few providers which "after fulfilling
>all the criteria" could be added - mostly open-source standards,
>generic, established technologies - but it should be rather limited
>and rare event.
>
>* when there is a proprietary service which has not too broad reach
>and it's not likely that we will have some committers who will be
>maintaining it - becauyse they are users - the default option should
>be to make a standalone per-service providers. the difficulty here is
>to set the right "non-quality" criteria - but I think we really want
>to limit any new code to maintain. Here maybe we can have some more
>concrete criteria proposed - so that we do not have to vote
>individually on each proposed providers - and so that those who want
>to propose a provider could check themselves by reading the criteria,
>what's best for them.
>
>* we might improve our "providers" list at the "ecosystem" to make
>providers stand out a bit more (maybe simply put them on top and make
>a clearly visible section). We are not going to maintain and keep the
>nice "registry" similar to Astronomer's one (we could even actually
>make the link to the Astronomer registry more prominent as the way to
>"search" for providers on our Ecosystem Page. We could also add a link
>to Pypi with the "aifrflow provider" classifier at the ecosystem page
>as another way of searching for providers. All that is perfectly fine,
>I think with the ASF Policies and spirit. And it will be good for
>discovery.
>
>WDYT?
>
>J.
>
>On Mon, Apr 18, 2022 at 3:59 PM Samhita Alla <samh...@union.ai> wrote:
>>
>> Hello!
>>
>> The reason behind submitting Flyte provider to the Airflow repository is 
>> because we felt it'd be effortless for the Airflow users to use the 
>> integration. Moreover, since it'd be under the umbrella of Airflow, we 
>> estimated that the Airflow users would not hesitate from using the provider.
>>
>> We could definitely have this as a standalone provider, but the 
>> easy-to-get-started incentive of Airflow providers seemed like a better 
>> option.
>>
>> If there's a sophisticated plan in place for having standalone providers in 
>> PyPI, we're up for it.
>>
>> Thanks,
>> Samhita
>>
>> On Wed, Apr 13, 2022 at 9:58 PM Alex Ott <alex...@gmail.com> wrote:
>>>
>>> Hello all
>>>
>>> I want to try to explain a motivation behind submission of the Delta 
>>> Sharing provider:
>>>
>>> Let me start with the fact that the original issue was created against 
>>> Airflow repository, and it was accepted as potential new functionality. And 
>>> discussion about new providers has started almost on the day when PR was 
>>> submitted :-)
>>> Delta Sharing is the OSS project under umbrella of the Linux Foundation 
>>> that defines a protocol and reference implementations. It was started by 
>>> the Databricks, but has other contributors as well - that's why it wasn't 
>>> pushed into a Databricks provider, as it's not specific to Databricks.
>>> Another thought about submitting it as a separate provider was to get more 
>>> people interested in this functionality and build additional integrations 
>>> on top of it.
>>> Another important aspect of having providers in the Airflow repository is 
>>> that they are tested together with changes in the core of the Airflow.
>>>
>>> I completely understand the concerns about more maintenance effort, but my 
>>> personal point of view (about it below) is similar to Rafal's & John's - if 
>>> there are well defined criteria & plans for decommissioning or something 
>>> like, then providers could be part of the releases, etc.
>>>
>>> I just want to add that although I'm employed by Databricks, I'm not a part 
>>> of the development team - I'm in the field team who work with customers, 
>>> sees how they are using different tools, seeing pain points, etc.  Most of 
>>> work so far was done on my own time - I'm doing some coordination, but most 
>>> of new functionality (AAD tokens support, Repos, Databricks SQL operators, 
>>> etc.) is coming from seeing customers using Airflow together with 
>>> Databricks.
>>>
>>>
>>> On Mon, Apr 11, 2022 at 9:14 PM Rafal Biegacz 
>>> <rafalbieg...@google.com.invalid> wrote:
>>>>
>>>> Hi,
>>>>
>>>> I think that we will need to find some middle ground here - we are trying 
>>>> to optimize in many dimensions (Jarek mentioned 3 of them). Maybe I would 
>>>> also add another 4th dimension - Airflow Service Provider, :).
>>>>
>>>> Airflow users - whether they do self-managed Airflow or use "managed 
>>>> Airflow" provided by others are beneficients of the fact that Airflow has 
>>>> a decent portfolio of providers.
>>>> It's not only a guarantee that these providers should work fine and they 
>>>> meet Airflow coding/testing standards. It's also a kind of guarantee, that 
>>>> once they start using Airflow
>>>> with providers backed by the Airflow community they won't be on their own 
>>>> when it comes to troubleshooting/updating/etc. It will be much easier for 
>>>> them to convince their companies to use Airflow for production use cases 
>>>> as the Airflow platform (core + providers) is tested/maintained by the 
>>>> Airflow community.
>>>>
>>>> Keeping providers within the Airflow repository generates integration and 
>>>> maintenance work on the Airflow community side. On the other hand, if this 
>>>> work is not done within the community then this effort would need to be 
>>>> done by all users to a certain extent. So from this perspective it's more 
>>>> optimal for the community to do it so users can use off-the-shelf Airflow 
>>>> for the majority of their use cases
>>>>
>>>> When it comes to accepting new providers - I like John's suggestions:
>>>> a) well defined standard that needs to be met by providers - passing the 
>>>> "provider qualification" would be some effort so each service provider 
>>>> would need to decide if it wouldn't be easier to maintain their provider 
>>>> on their own.
>>>> b) well define lifecycle for providers - which would allow to identify 
>>>> providers that are obsolete/not popular any more and make them obsolete.
>>>>
>>>> Regards, Rafal.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Apr 11, 2022 at 6:47 PM Jarek Potiuk <ja...@potiuk.com> wrote:
>>>>>
>>>>> I've been thinking about it - to make up my mind a little. The good thing 
>>>>> for me is that I have no strong opinion and I can rather easily see (or 
>>>>> so I think) of both sides.
>>>>>
>>>>> TL;DR; I think we need an explanation from the "Service Providers" - what 
>>>>> they want to achieve by contributing providers to the community and see 
>>>>> if we can achieve similar results differently.
>>>>>
>>>>>
>>>>> Obviously I am a bit biased from the maintainer point of view, but since 
>>>>> I cooperate with various stakeholders i spoke to some of them just see 
>>>>> their point of view and this is what I got:
>>>>>
>>>>> Seems that we have really three  types of stakeholders that are really 
>>>>> interested in "providers":
>>>>>
>>>>> 1) "Maintainers" - those who mostly maintain Airflow and have to take 
>>>>> care about its future and development and "grand vision" of where we want 
>>>>> to be in few years
>>>>> 2) "Users" - those who use Airflow and integration with the Service 
>>>>> Provider
>>>>> 3) "Service providers" - those who run the services that Airflow 
>>>>> integrates with - via providers (that group might also contain those 
>>>>> stakeholders that run Airflow "as a service")
>>>>>
>>>>> Let me see it from all the different POVs:
>>>>>
>>>>>
>>>>> From 1) Maintainer POV
>>>>>
>>>>> More providers mean slower growth of the platform overall as the more 
>>>>> providers we add and manage as a community, the less time we can spend on 
>>>>> improving Airflow as a core.
>>>>> Also the vision I think we all share is that Airflow is not a "standalone 
>>>>> orchestrator" any more - due to its popularity, reach and power, it 
>>>>> became an "orchestrating platform" and this is the vision that keeps us - 
>>>>> maintainers - busy.
>>>>>
>>>>> Over the last 2 years pretty much everything we do - make Airflow "more 
>>>>> extensible". You can add custom "secrets managers". "timetables", 
>>>>> "defferers" etc. "Customizability" is now built-in and "theme" of being a 
>>>>> modern platform.
>>>>> Hell - we even recently added "Airflow Provider" trove classified in 
>>>>> PyPI: 
>>>>> https://pypi.org/search/?c=Framework+%3A%3A+Apache+Airflow+%3A%3A+Provider
>>>>>  and the main justification in the discussion was that we expect MORE 
>>>>> 3rd-parties to use it, rather than relying on "apache-airflow-provider" 
>>>>> package name.
>>>>> So from maintainer POV - having 3rd-party providers as "extensions" to 
>>>>> Airlow makes perfect sense and is the way to go.
>>>>>
>>>>>
>>>>> From  2) User POV
>>>>>
>>>>> Users want to use Airflow with all the integrations they use together. 
>>>>> But only with those that they actually use. Similarly as maintainers - 
>>>>> supporting and needing all 70+ providers is something they usually do not 
>>>>> REALLY care about.
>>>>> They literally care about the few providers they use. We even taught the 
>>>>> users that they can upgrade and install providers separately from the 
>>>>> core. So they already know they can mix and match Airflow + Providers to 
>>>>> get what they want.
>>>>>
>>>>> And they do use it - even if they use our image, the image only contains 
>>>>> a handful of the providers and when they need to install
>>>>> new providers - they just install it from PyPI. And for that the 
>>>>> difference of "community providers" vs. 3rd party providers - except the 
>>>>> stamp of approval of the ASF, is not really visible.
>>>>> Surely they can use [extras] to install the providers but that is just a 
>>>>> convenience and is definitely not needed by the users.
>>>>> For example when they build a custom image they usually extend Airflow 
>>>>> and simply 'pip install <PROVIDER>'
>>>>> As long as someone makes sure that the provider can be installed on 
>>>>> certain versions of Airflow - it does not matter.
>>>>>
>>>>> Also from the users perspective Airflow became "popular" enough that it 
>>>>> no longer needed "more integrations" to be more "appealing" for the users.
>>>>> They already use Airflow. They like it (hopefully) and the fact that this 
>>>>> or that provider is part of the community makes no difference any more.
>>>>>
>>>>>
>>>>> From 3) "Service providers" POV
>>>>>
>>>>> Here I am not sure. It's not very clear what service providers get from 
>>>>> being part of the "community providers".
>>>>>
>>>>> I hear that some big service (cloud providers) find it cool that we give 
>>>>> it the ASF "Stamp of Approval". And they are willing to pay the price of 
>>>>> a slower merge process, dependence on the community and following strict 
>>>>> rules of the ASF.
>>>>> And the community also is happy to pay the price of maintaining those 
>>>>> (including the dependencies which Elad mention) to make sure that all the 
>>>>> community providers work in concert - because those "Services" are hugely 
>>>>> popular and we "want" as a community to invest there.
>>>>> But maintaining those  deps in sync is a huge effort and it will become 
>>>>> even worse - the more we add. On the other hand for 3rd party providers 
>>>>> it will be EASIER to keep up.
>>>>> They don't have to care about all the community providers to work 
>>>>> together, they can choose a subset. And when they release their libraries 
>>>>> they can take care about making sure the dependencies are not broken.
>>>>>
>>>>> There are other "drawbacks" for being a "community" provider. For example 
>>>>> we have the rule that we support the min-Airflow version for providers 
>>>>> from the community 12 months after Airflow release.
>>>>> This means that users of Airflow 2.1 will not receive updates for the 
>>>>> providers after 21st of May. This is the price to pay for 
>>>>> community-managed providers. We will not release bug fixes in providers 
>>>>> or changes for Airflow 2.1 users after 21st of May.
>>>>> But if you manage your own provider - you still can support 2.0 or even 
>>>>> 1.10 if you want.
>>>>>
>>>>> I cannot really see why a Service Provider would want to become an 
>>>>> Airflow Community Provider.
>>>>>
>>>>> And I am not really sure what  Flyte, Delta Sharing, Versatile Data Kit, 
>>>>> and Cloudera people think and why they think this is the best choice.
>>>>>
>>>>> I think when we understand what the  "Service Providers" want to achieve 
>>>>> this way, maybe we will be able to come up with some middle ground and at 
>>>>> least set some rules when it makes sense and when it does not make sense.
>>>>> Maybe 'contributing provider' is the way to achieve something else and we 
>>>>> simply do not realize that in the new "Airflow as a Platform" world, all 
>>>>> the stakeholders can achieve very similar results using different 
>>>>> approaches.
>>>>>
>>>>> * For example we could think about how we can make it easier for Airflow 
>>>>> users to discover and install their providers - without actually taking 
>>>>> ownership of the code by the community.
>>>>> * Or maybe we could introduce a tool to make a 3rd-party provider pass a 
>>>>> "compliance check" as suggested above
>>>>> * Or maybe we could introduce a "breeze" extension to be able to install 
>>>>> and test provider in the "latest airflow" so that the service providers 
>>>>> could check it before we even release airflow and dependencies
>>>>>
>>>>> So what I think we really need -  Alex, Samhita, Andon, Philippe (I 
>>>>> think) - could you tell us (every one of you separately) - what are your 
>>>>> goals when you came up with the "contribute the new provider" idea?
>>>>>
>>>>> J.
>>>>>
>>>>> On Wed, Apr 6, 2022 at 11:51 PM Elad Kalif <elad...@apache.org> wrote:
>>>>>>
>>>>>> Ash what is your recommendation for the users should we follow your 
>>>>>> suggestion?
>>>>>> This means that the big big big joy of using airflow constraints and 
>>>>>> getting a working environment with all required providers will be no 
>>>>>> more.
>>>>>> So users will get a working "Vanilla" Airflow and then will need to 
>>>>>> figure out how they are going to tackle independent providers that may 
>>>>>> not be able to coexist one with another.
>>>>>> This means that users will need to create their own constraints 
>>>>>> mechanism and maintain it.
>>>>>>
>>>>>> From my perspective this increases the complexity of getting Airflow to 
>>>>>> be production ready.
>>>>>> I know that we say providers vs core but I think that from users 
>>>>>> perspective providers are an integral part of Airflow.
>>>>>> Having the best scheduler and the best UI is not enough. Providers are a 
>>>>>> crucial part that complete the set.
>>>>>>
>>>>>> Maybe eventually there should be something like a provider store where 
>>>>>> there can be official providers and 3rd party providers.
>>>>>>
>>>>>> This may be even greater discussion than what we are having here. It 
>>>>>> feels more like Airflow as a product vs Airflow as an ecosystem.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Thu, Apr 7, 2022 at 12:27 AM Collin McNulty 
>>>>>> <col...@astronomer.io.invalid> wrote:
>>>>>>>
>>>>>>> I agree with Ash and Tomasz. If it were not for the history, I think in 
>>>>>>> an ideal world even the providers currently part of the Airflow repo 
>>>>>>> would be managed separately. (I'm not actually suggesting removing any 
>>>>>>> providers.) I don't think it's a matter of gatekeeping, I just think 
>>>>>>> it's actually kind of odd to have providers in the same repo as core 
>>>>>>> Airflow, and it increases confusion about Airflow versions vs provider 
>>>>>>> package versions.
>>>>>>>
>>>>>>> Collin McNulty
>>>>>>>
>>>>>>> On Wed, Apr 6, 2022 at 4:21 PM Tomasz Urbaszek <turbas...@apache.org> 
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> I’m leaning toward Ash approach. Having providers maintaining the 
>>>>>>>> packages may streamline many aspects for providers/companies.
>>>>>>>>
>>>>>>>> 1. They are owners so they can merge and release whenever they need.
>>>>>>>> 2. It’s easier for them to add E2E tests and manage the resources 
>>>>>>>> needed for running them.
>>>>>>>> 3. The development of the package can be incorporated into their 
>>>>>>>> company processes - not every company is used to OSS mode.
>>>>>>>>
>>>>>>>> Whatever way we go - we should have some basics guidelines and 
>>>>>>>> requirements (for example to brand a provider as “recommended by 
>>>>>>>> community” or something).
>>>>>>>>
>>>>>>>> Cheers,
>>>>>>>> Tomsk
>>>
>>>
>>>
>>> --
>>> With best wishes,                    Alex Ott
>>> http://alexott.net/
>>> Twitter: alexott_en (English), alexott (Russian)

Reply via email to