To help people get started down this path one of the teams at
Astronomer have created
<https://github.com/astronomer/airflow-provider-sample> and then if
you'd like (it's entirely optional) you can submit it to the Astronomer
Registry <<https://registry.astronomer.io/>> via
<https://registry.astronomer.io/publish>
-a
On Mon, Apr 11 2022 at 16:22:25 +0100, Kaxil Naik <[email protected]>
wrote:
+1 -- I agree with Max & Rafal that this should be a provider living
in a separate repo and maintained by Flyte (if possible) so you can
quickly and easily make changes when you update your APIs.
Regards,
Kaxil
On Sun, 10 Apr 2022 at 13:04, Rafal Biegacz
<[email protected]> wrote:
Samhita & Max,
Maybe a good starting point would be to offer this provider in the
form of an installable PYPI module, similar to what is being done in
case of "Great Expectation" provider
(<https://pypi.org/project/airflow-provider-great-expectations/>) ?
Regards, Rafal.
On Wed, Apr 6, 2022 at 8:31 PM Max Payton <[email protected]>
wrote:
As someone who works at Lyft, we do use a version of this operator
that we wrote internally, in about 5% of our DAGs (out of 2000).
Flyte doesn't really have a concept of sensors, nor can it interact
with other tasks in Airflow, so it primarily is useful for
scheduling ml pipelines with dependencies on Airflow orchestrated
tables. It would be useful to us to have an officially sponsored
version of this operator maintained by the Flyte org directly.
*Max Payton*
He/Him/His
Software Engineer
202.441.7757 <tel:+12024417757>
<http://www.lyft.com/>
On Fri, Apr 1, 2022 at 12:23 AM Samhita Alla <[email protected]
<mailto:[email protected]>> wrote:
Hello,
I work on an open-source project called Flyte
<https://github.com/flyteorg/flyte>, which is a container-native,
structured programming and distributed processing platform that
enables highly concurrent, scalable, and maintainable workflows
for machine learning and data processing pipelines.
As a more significant chunk of the users who are into /pipelines/
are using Airflow, we've been thinking about building a provider
that bridges the gap between Airflow and Flyte, to help the
Airflow users retain their existing pipelines for ETL and use
Flyte from within the Airflow DAGs to run machine learning jobs
(say). At Lyft, where Airflow and Flyte are used together, they
extensively use this operator to enable Airflow DAGs interop with
Flyte. Lately, our users have also been requesting for this
feature.
We've had this operator in the back of our minds for a long time;
here's the issue <https://github.com/flyteorg/flyte/issues/544>.
The Flyte team would like to know the community's thoughts on this
provider.
Many thanks,
Samhita