I thought a bit about it, and I think the way we have "Astronomer" behind it, it checks all the boxes - providing that we will also have some (super simple) dashboard similar to the MWAA one https://aws-mwaa.github.io/open-source/system-tests/dashboard.html .
From https://github.com/apache/airflow/blob/main/PROVIDERS.rst#rd-party-providers : > While we already have - historically - a number of 3rd-party service providers managed by the community, most of those services have dedicated teams that keep an eye on the community providers and not only take active part in managing them (see mixed-governance model below), but also provide a way that we can verify whether the provider works with the latest version of the service via dashboards that show status of System Tests for the provider. This allows us to have a high level of confidence that when we release the provider it works with the latest version of the service. System Tests are part of the Airflow code, but they are executed and verified by those 3rd party service teams. We are working with the 3rd party service teams (who are often important stakeholders of the Apache Airflow project) to add dashboards for the historical providers that are managed by the community, and current set of Dashboards can be also found at the Ecosystem: system test dashboards Whenever someone (including Weaviate in the past) asked if they can contribute providers - we always referred to that chapter and we said - "we need to have good reason" and "we need to have confidence the integration is not broken in the future. So there are two conditions IMHO: 1) Having a good reason why we want it in 2) Having a confidence that we can keep the integration "working" in the future without a lot of overhead and having to pay for the integration Re 1) I think there is a very good reason why we want to have those in the community - LLMs are all the rage and making Airflow with LLM as first-class-citizen is no-brainer and Kaxil laid it out nicely in the email. Re 2) I think this is a great opportunity for Astronomers to take the "3rd-party maintenance" role to follow the "System Test dashboard" idea. Of course Astronomer is going to be committed to it - no doubt about it :) . And I believe Astronomer already runs similar tests using Airflow Managed instances to run Airflow test cases (and more/less complex DAGs regularly). As long as we have some basic example_dags/system_tests added for those providers and they are run regularly on Astronomer managed instances with accounts to Weaviate and others configured + some simple dashboard where we can see the status of those DAG runs we should be good to go. Not everyone here is aware of that but there were already a number of issues fixed by the MWAA team by simply being alerted by the regular system tests and they were able to fix those issues before they made their way into new releases. I - for one - usually take a quick look at the dashboard before a new provider's release and it gives quite a lot of confidence that some "serious" issues are not overlooked. Seeing a whole week of "all green" there is reassuring - this was quite an effort from MWAA team to implement it and keep it running, but I think the scope/complexity of LLM integration is much lower - and those example dags should be far more stable and straightforward to run by Astronomer, because the LLM cases are generally much simpler than "infrastructure" cases of the multiple services AWS integration requires. It could be even a super simple dump of HTML to a public S3 bucket like MWAA does - using Airflow to run it and Airflow API to retrieve the status for example) + some alerting on Astronomer side to detect (and fix before release) any issues would be more than enough and would check all the boxes for me. J. On Tue, Oct 17, 2023 at 8:42 PM Kaxil Naik <kaxiln...@apache.org> wrote: > Hey Everyone, > > As a follow-up to my Keynote talk, Building and deploying LLM applications > with Apache Airflow <https://www.youtube.com/watch?v=mgA6m3ggKhs&t=4s>, I > am formally proposing the addition of these 5 providers to the Apache > Airflow repo: > > - > > PgVector <https://github.com/pgvector/pgvector> > - > > Weaviate <https://weaviate.io/> > - > > Pinecone <https://www.pinecone.io/> > - > > OpenAI <https://openai.com/> > - > > Cohere <https://cohere.com/> > > > Advancements in LLMs are moving at a rapid pace & transforming the way we > work and our industry. Although LLMs are simple to use in prototyping, > using LLM for enterprise applications and for production still presents a > lot of challenges. These > < > https://speakerdeck.com/kaxil/building-and-deploying-llm-applications-with-apache-airflow?slide=8 > > > are some of the same problems that we tackle in Data Engineering, and > Airflow is a natural fit for them. > > We at Astronomer would like to add first-class support for the popular LLMs > (OpenAI & Cohere) and vector DBs (PgVector, Weaviate & Pinecone) so that > Data Scientists and ML engineers can utilize them natively with easy-to-use > Operator & Hook abstractions while providing a native (and > Production-ready) approach for Authentication, retries, logging etc. > > We also think this is vital for the Apache Airflow project as we, the > project, embrace the LLM tide and continue to be a great example of > balancing innovation and maintaining backward-compatibility. > > The first versions of these providers will enable building one of the most > common use cases of LLMs i.e. Question and Answering / Chatbots using > Retrieval-augmented generation (RAG) done with the help of embeddings. > > Everyone is welcome and encouraged to contribute once the PRs are merged. > Astronomer is committed to maintaining these providers in the Airflow repo, > including reviewing PRs, maintaining code quality, testing and keeping the > APIs up-to-date. > > Note: PgVector <https://github.com/pgvector/pgvector> is an open-source > project, so we don’t need a formal vote for it as per our guidelines > < > https://github.com/apache/airflow/blob/main/PROVIDERS.rst#accepting-new-community-providers > >. > So please consider this email as seeking a Lazy Consensus for it. > > I will open up a VOTING thread after discussing this for a few days. > > Thanks. > > Regards, > > Kaxil >