Re: [DISCUSSION] Add 5 new Providers to enable first-class LLMOps

Andrey Anshin Thu, 19 Oct 2023 09:04:43 -0700

Because 4 out 5 new providers have a draft PR I would like to raise a
question about which related to all new providers.  Just to avoid the same
question in all PRs.


Do we actually want to make new operators kindish of like "PythonOperator"?
Maybe I miss some important thing and can't see why it would work better
rather than run hooks methods inside of PythonOperator / TaskFlow?

For the reference Reference:
Add Cohere Provider:
https://github.com/apache/airflow/pull/34921#discussion_r1358525838
Enable pgvector support for Postgres provider:
https://github.com/apache/airflow/pull/34891#discussion_r1362910782
Add OpenAI Provider:
https://github.com/apache/airflow/pull/35023#discussion_r1365235167
Add Weaviate Provider:
https://github.com/apache/airflow/pull/35060/files#r1365765741

----
Best Wishes
*Andrey Anshin*



On Tue, 17 Oct 2023 at 22:42, Kaxil Naik <kaxiln...@apache.org> wrote:

> Hey Everyone,
>
> As a follow-up to my Keynote talk, Building and deploying LLM applications
> with Apache Airflow <https://www.youtube.com/watch?v=mgA6m3ggKhs&t=4s>, I
> am formally proposing the addition of these 5 providers to the Apache
> Airflow repo:
>
>    -
>
>    PgVector <https://github.com/pgvector/pgvector>
>    -
>
>    Weaviate <https://weaviate.io/>
>    -
>
>    Pinecone <https://www.pinecone.io/>
>    -
>
>    OpenAI <https://openai.com/>
>    -
>
>    Cohere <https://cohere.com/>
>
>
> Advancements in LLMs are moving at a rapid pace & transforming the way we
> work and our industry. Although LLMs are simple to use in prototyping,
> using LLM for enterprise applications and for production still presents a
> lot of challenges. These
> <
> https://speakerdeck.com/kaxil/building-and-deploying-llm-applications-with-apache-airflow?slide=8
> >
> are some of the same problems that we tackle in Data Engineering, and
> Airflow is a natural fit for them.
>
> We at Astronomer would like to add first-class support for the popular LLMs
> (OpenAI & Cohere) and vector DBs (PgVector, Weaviate & Pinecone) so that
> Data Scientists and ML engineers can utilize them natively with easy-to-use
> Operator & Hook abstractions while providing a native (and
> Production-ready) approach for Authentication, retries, logging etc.
>
> We also think this is vital for the Apache Airflow project as we, the
> project, embrace the LLM tide and continue to be a great example of
> balancing innovation and maintaining backward-compatibility.
>
> The first versions of these providers will enable building one of the most
> common use cases of LLMs i.e. Question and Answering / Chatbots using
> Retrieval-augmented generation (RAG) done with the help of embeddings.
>
> Everyone is welcome and encouraged to contribute once the PRs are merged.
> Astronomer is committed to maintaining these providers in the Airflow repo,
> including reviewing PRs, maintaining code quality, testing and keeping the
> APIs up-to-date.
>
> Note: PgVector <https://github.com/pgvector/pgvector> is an open-source
> project, so we don’t need a formal vote for it as per our guidelines
> <
> https://github.com/apache/airflow/blob/main/PROVIDERS.rst#accepting-new-community-providers
> >.
> So please consider this email as seeking a Lazy Consensus for it.
>
> I will open up a VOTING thread after discussing this for a few days.
>
> Thanks.
>
> Regards,
>
> Kaxil
>

Re: [DISCUSSION] Add 5 new Providers to enable first-class LLMOps

Reply via email to