Absolutely, we will publish the results of test runs somewhere, we would
probably start with dumping them in a publicly-accessible S3 bucket /
Github issue and then move to a Dashboard.

> Re 2) I think this is a great opportunity for Astronomers to take the
> "3rd-party maintenance" role to follow the "System Test dashboard" idea.


Yup, we run a lot of integration/system tests from Airflow main too which
when break, we fix them with PRs to the main branch.

It could be even a super simple dump of HTML to a public S3 bucket like
> MWAA does - using Airflow to run it and Airflow API to retrieve the status
> for example) + some alerting on Astronomer side to detect (and fix before
> release) any issues would be more than enough and would check all the boxes
> for me.


Regards,
Kaxil

On Wed, 18 Oct 2023 at 11:50, Jarek Potiuk <ja...@potiuk.com> wrote:

> I thought a bit about it, and I think the way we have "Astronomer" behind
> it, it checks all the boxes - providing that we will also have some (super
> simple) dashboard similar to the MWAA one
> https://aws-mwaa.github.io/open-source/system-tests/dashboard.html .
>
> From
>
> https://github.com/apache/airflow/blob/main/PROVIDERS.rst#rd-party-providers
> :
>
> > While we already have - historically - a number of 3rd-party service
> providers managed by the community, most of those services have dedicated
> teams that keep an eye on the community providers and not only take active
> part in managing them (see mixed-governance model below), but also provide
> a way that we can verify whether the provider works with the latest version
> of the service via dashboards that show status of System Tests for the
> provider. This allows us to have a high level of confidence that when we
> release the provider it works with the latest version of the service.
> System Tests are part of the Airflow code, but they are executed and
> verified by those 3rd party service teams. We are working with the 3rd
> party service teams (who are often important stakeholders of the Apache
> Airflow project) to add dashboards for the historical providers that are
> managed by the community, and current set of Dashboards can be also found
> at the Ecosystem: system test dashboards
>
> Whenever someone (including Weaviate in the past) asked if they can
> contribute providers - we always referred to that chapter and we said - "we
> need to have good reason" and "we need to have confidence the integration
> is not broken in the future.
>
> So there are two conditions IMHO:
>
> 1) Having a good reason why we want it in
> 2) Having a confidence that we can keep the integration "working" in the
> future without a lot of overhead and having to pay for the integration
>
> Re 1) I think there is a very good reason why we want to have those in the
> community - LLMs are all the rage and making Airflow with LLM as
> first-class-citizen is no-brainer and Kaxil laid it out nicely in the
> email.
> Re 2) I think this is a great opportunity for Astronomers to take the
> "3rd-party maintenance" role to follow the "System Test dashboard" idea.
>
> Of course Astronomer is going to be committed to it - no doubt about it :)
> . And I believe Astronomer already runs similar tests using Airflow Managed
> instances to run Airflow test cases (and more/less complex DAGs
> regularly). As long as we have some basic example_dags/system_tests added
> for those providers and they are run regularly on Astronomer managed
> instances with accounts to Weaviate and others configured + some simple
> dashboard where we can see the status of those DAG runs we should be good
> to go.
>
> Not everyone here is aware of that but there were already a number of
> issues fixed by the MWAA team by simply being alerted by the regular system
> tests and they were able to fix those issues before they made their way
> into new releases. I - for one - usually take a quick look at the dashboard
> before a new provider's release and it gives quite a lot of confidence that
> some "serious" issues are not overlooked. Seeing a whole week of "all
> green" there is reassuring - this was quite an effort from MWAA team to
> implement it and keep it running, but I think the scope/complexity of LLM
> integration is much lower - and those example dags should be far more
> stable and straightforward to run by Astronomer, because the LLM cases are
> generally much simpler than "infrastructure" cases of the multiple services
> AWS integration requires.
>
> It could be even a super simple dump of HTML to a public S3 bucket like
> MWAA does - using Airflow to run it and Airflow API to retrieve the status
> for example) + some alerting on Astronomer side to detect (and fix before
> release) any issues would be more than enough and would check all the boxes
> for me.
>
>
> J.
>
>
> On Tue, Oct 17, 2023 at 8:42 PM Kaxil Naik <kaxiln...@apache.org> wrote:
>
> > Hey Everyone,
> >
> > As a follow-up to my Keynote talk, Building and deploying LLM
> applications
> > with Apache Airflow <https://www.youtube.com/watch?v=mgA6m3ggKhs&t=4s>,
> I
> > am formally proposing the addition of these 5 providers to the Apache
> > Airflow repo:
> >
> >    -
> >
> >    PgVector <https://github.com/pgvector/pgvector>
> >    -
> >
> >    Weaviate <https://weaviate.io/>
> >    -
> >
> >    Pinecone <https://www.pinecone.io/>
> >    -
> >
> >    OpenAI <https://openai.com/>
> >    -
> >
> >    Cohere <https://cohere.com/>
> >
> >
> > Advancements in LLMs are moving at a rapid pace & transforming the way we
> > work and our industry. Although LLMs are simple to use in prototyping,
> > using LLM for enterprise applications and for production still presents a
> > lot of challenges. These
> > <
> >
> https://speakerdeck.com/kaxil/building-and-deploying-llm-applications-with-apache-airflow?slide=8
> > >
> > are some of the same problems that we tackle in Data Engineering, and
> > Airflow is a natural fit for them.
> >
> > We at Astronomer would like to add first-class support for the popular
> LLMs
> > (OpenAI & Cohere) and vector DBs (PgVector, Weaviate & Pinecone) so that
> > Data Scientists and ML engineers can utilize them natively with
> easy-to-use
> > Operator & Hook abstractions while providing a native (and
> > Production-ready) approach for Authentication, retries, logging etc.
> >
> > We also think this is vital for the Apache Airflow project as we, the
> > project, embrace the LLM tide and continue to be a great example of
> > balancing innovation and maintaining backward-compatibility.
> >
> > The first versions of these providers will enable building one of the
> most
> > common use cases of LLMs i.e. Question and Answering / Chatbots using
> > Retrieval-augmented generation (RAG) done with the help of embeddings.
> >
> > Everyone is welcome and encouraged to contribute once the PRs are merged.
> > Astronomer is committed to maintaining these providers in the Airflow
> repo,
> > including reviewing PRs, maintaining code quality, testing and keeping
> the
> > APIs up-to-date.
> >
> > Note: PgVector <https://github.com/pgvector/pgvector> is an open-source
> > project, so we don’t need a formal vote for it as per our guidelines
> > <
> >
> https://github.com/apache/airflow/blob/main/PROVIDERS.rst#accepting-new-community-providers
> > >.
> > So please consider this email as seeking a Lazy Consensus for it.
> >
> > I will open up a VOTING thread after discussing this for a few days.
> >
> > Thanks.
> >
> > Regards,
> >
> > Kaxil
> >
>

Reply via email to