Note: Weaviate <https://github.com/weaviate/weaviate> is also an open-source project with over 470k+ downloads last month <https://pypistats.org/packages/weaviate-client>.
On Thu, 19 Oct 2023 at 12:40, Kaxil Naik <kaxiln...@gmail.com> wrote: > Absolutely, we will publish the results of test runs somewhere, we would > probably start with dumping them in a publicly-accessible S3 bucket / > Github issue and then move to a Dashboard. > >> Re 2) I think this is a great opportunity for Astronomers to take the >> "3rd-party maintenance" role to follow the "System Test dashboard" idea. > > > Yup, we run a lot of integration/system tests from Airflow main too which > when break, we fix them with PRs to the main branch. > > It could be even a super simple dump of HTML to a public S3 bucket like >> MWAA does - using Airflow to run it and Airflow API to retrieve the status >> for example) + some alerting on Astronomer side to detect (and fix before >> release) any issues would be more than enough and would check all the >> boxes >> for me. > > > Regards, > Kaxil > > On Wed, 18 Oct 2023 at 11:50, Jarek Potiuk <ja...@potiuk.com> wrote: > >> I thought a bit about it, and I think the way we have "Astronomer" behind >> it, it checks all the boxes - providing that we will also have some (super >> simple) dashboard similar to the MWAA one >> https://aws-mwaa.github.io/open-source/system-tests/dashboard.html . >> >> From >> >> https://github.com/apache/airflow/blob/main/PROVIDERS.rst#rd-party-providers >> : >> >> > While we already have - historically - a number of 3rd-party service >> providers managed by the community, most of those services have dedicated >> teams that keep an eye on the community providers and not only take active >> part in managing them (see mixed-governance model below), but also provide >> a way that we can verify whether the provider works with the latest >> version >> of the service via dashboards that show status of System Tests for the >> provider. This allows us to have a high level of confidence that when we >> release the provider it works with the latest version of the service. >> System Tests are part of the Airflow code, but they are executed and >> verified by those 3rd party service teams. We are working with the 3rd >> party service teams (who are often important stakeholders of the Apache >> Airflow project) to add dashboards for the historical providers that are >> managed by the community, and current set of Dashboards can be also found >> at the Ecosystem: system test dashboards >> >> Whenever someone (including Weaviate in the past) asked if they can >> contribute providers - we always referred to that chapter and we said - >> "we >> need to have good reason" and "we need to have confidence the integration >> is not broken in the future. >> >> So there are two conditions IMHO: >> >> 1) Having a good reason why we want it in >> 2) Having a confidence that we can keep the integration "working" in the >> future without a lot of overhead and having to pay for the integration >> >> Re 1) I think there is a very good reason why we want to have those in the >> community - LLMs are all the rage and making Airflow with LLM as >> first-class-citizen is no-brainer and Kaxil laid it out nicely in the >> email. >> Re 2) I think this is a great opportunity for Astronomers to take the >> "3rd-party maintenance" role to follow the "System Test dashboard" idea. >> >> Of course Astronomer is going to be committed to it - no doubt about it :) >> . And I believe Astronomer already runs similar tests using Airflow >> Managed >> instances to run Airflow test cases (and more/less complex DAGs >> regularly). As long as we have some basic example_dags/system_tests added >> for those providers and they are run regularly on Astronomer managed >> instances with accounts to Weaviate and others configured + some simple >> dashboard where we can see the status of those DAG runs we should be good >> to go. >> >> Not everyone here is aware of that but there were already a number of >> issues fixed by the MWAA team by simply being alerted by the regular >> system >> tests and they were able to fix those issues before they made their way >> into new releases. I - for one - usually take a quick look at the >> dashboard >> before a new provider's release and it gives quite a lot of confidence >> that >> some "serious" issues are not overlooked. Seeing a whole week of "all >> green" there is reassuring - this was quite an effort from MWAA team to >> implement it and keep it running, but I think the scope/complexity of LLM >> integration is much lower - and those example dags should be far more >> stable and straightforward to run by Astronomer, because the LLM cases are >> generally much simpler than "infrastructure" cases of the multiple >> services >> AWS integration requires. >> >> It could be even a super simple dump of HTML to a public S3 bucket like >> MWAA does - using Airflow to run it and Airflow API to retrieve the status >> for example) + some alerting on Astronomer side to detect (and fix before >> release) any issues would be more than enough and would check all the >> boxes >> for me. >> >> >> J. >> >> >> On Tue, Oct 17, 2023 at 8:42 PM Kaxil Naik <kaxiln...@apache.org> wrote: >> >> > Hey Everyone, >> > >> > As a follow-up to my Keynote talk, Building and deploying LLM >> applications >> > with Apache Airflow <https://www.youtube.com/watch?v=mgA6m3ggKhs&t=4s>, >> I >> > am formally proposing the addition of these 5 providers to the Apache >> > Airflow repo: >> > >> > - >> > >> > PgVector <https://github.com/pgvector/pgvector> >> > - >> > >> > Weaviate <https://weaviate.io/> >> > - >> > >> > Pinecone <https://www.pinecone.io/> >> > - >> > >> > OpenAI <https://openai.com/> >> > - >> > >> > Cohere <https://cohere.com/> >> > >> > >> > Advancements in LLMs are moving at a rapid pace & transforming the way >> we >> > work and our industry. Although LLMs are simple to use in prototyping, >> > using LLM for enterprise applications and for production still presents >> a >> > lot of challenges. These >> > < >> > >> https://speakerdeck.com/kaxil/building-and-deploying-llm-applications-with-apache-airflow?slide=8 >> > > >> > are some of the same problems that we tackle in Data Engineering, and >> > Airflow is a natural fit for them. >> > >> > We at Astronomer would like to add first-class support for the popular >> LLMs >> > (OpenAI & Cohere) and vector DBs (PgVector, Weaviate & Pinecone) so that >> > Data Scientists and ML engineers can utilize them natively with >> easy-to-use >> > Operator & Hook abstractions while providing a native (and >> > Production-ready) approach for Authentication, retries, logging etc. >> > >> > We also think this is vital for the Apache Airflow project as we, the >> > project, embrace the LLM tide and continue to be a great example of >> > balancing innovation and maintaining backward-compatibility. >> > >> > The first versions of these providers will enable building one of the >> most >> > common use cases of LLMs i.e. Question and Answering / Chatbots using >> > Retrieval-augmented generation (RAG) done with the help of embeddings. >> > >> > Everyone is welcome and encouraged to contribute once the PRs are >> merged. >> > Astronomer is committed to maintaining these providers in the Airflow >> repo, >> > including reviewing PRs, maintaining code quality, testing and keeping >> the >> > APIs up-to-date. >> > >> > Note: PgVector <https://github.com/pgvector/pgvector> is an open-source >> > project, so we don’t need a formal vote for it as per our guidelines >> > < >> > >> https://github.com/apache/airflow/blob/main/PROVIDERS.rst#accepting-new-community-providers >> > >. >> > So please consider this email as seeking a Lazy Consensus for it. >> > >> > I will open up a VOTING thread after discussing this for a few days. >> > >> > Thanks. >> > >> > Regards, >> > >> > Kaxil >> > >> >