Because 4 out 5 new providers have a draft PR I would like to raise a question about which related to all new providers. Just to avoid the same question in all PRs.
Do we actually want to make new operators kindish of like "PythonOperator"? Maybe I miss some important thing and can't see why it would work better rather than run hooks methods inside of PythonOperator / TaskFlow? For the reference Reference: Add Cohere Provider: https://github.com/apache/airflow/pull/34921#discussion_r1358525838 Enable pgvector support for Postgres provider: https://github.com/apache/airflow/pull/34891#discussion_r1362910782 Add OpenAI Provider: https://github.com/apache/airflow/pull/35023#discussion_r1365235167 Add Weaviate Provider: https://github.com/apache/airflow/pull/35060/files#r1365765741 ---- Best Wishes *Andrey Anshin* On Tue, 17 Oct 2023 at 22:42, Kaxil Naik <kaxiln...@apache.org> wrote: > Hey Everyone, > > As a follow-up to my Keynote talk, Building and deploying LLM applications > with Apache Airflow <https://www.youtube.com/watch?v=mgA6m3ggKhs&t=4s>, I > am formally proposing the addition of these 5 providers to the Apache > Airflow repo: > > - > > PgVector <https://github.com/pgvector/pgvector> > - > > Weaviate <https://weaviate.io/> > - > > Pinecone <https://www.pinecone.io/> > - > > OpenAI <https://openai.com/> > - > > Cohere <https://cohere.com/> > > > Advancements in LLMs are moving at a rapid pace & transforming the way we > work and our industry. Although LLMs are simple to use in prototyping, > using LLM for enterprise applications and for production still presents a > lot of challenges. These > < > https://speakerdeck.com/kaxil/building-and-deploying-llm-applications-with-apache-airflow?slide=8 > > > are some of the same problems that we tackle in Data Engineering, and > Airflow is a natural fit for them. > > We at Astronomer would like to add first-class support for the popular LLMs > (OpenAI & Cohere) and vector DBs (PgVector, Weaviate & Pinecone) so that > Data Scientists and ML engineers can utilize them natively with easy-to-use > Operator & Hook abstractions while providing a native (and > Production-ready) approach for Authentication, retries, logging etc. > > We also think this is vital for the Apache Airflow project as we, the > project, embrace the LLM tide and continue to be a great example of > balancing innovation and maintaining backward-compatibility. > > The first versions of these providers will enable building one of the most > common use cases of LLMs i.e. Question and Answering / Chatbots using > Retrieval-augmented generation (RAG) done with the help of embeddings. > > Everyone is welcome and encouraged to contribute once the PRs are merged. > Astronomer is committed to maintaining these providers in the Airflow repo, > including reviewing PRs, maintaining code quality, testing and keeping the > APIs up-to-date. > > Note: PgVector <https://github.com/pgvector/pgvector> is an open-source > project, so we don’t need a formal vote for it as per our guidelines > < > https://github.com/apache/airflow/blob/main/PROVIDERS.rst#accepting-new-community-providers > >. > So please consider this email as seeking a Lazy Consensus for it. > > I will open up a VOTING thread after discussing this for a few days. > > Thanks. > > Regards, > > Kaxil >