Thanks Vincent, I took a look , this is really good. Don't have access to
the confluence page to comment :) so adding it here.

As events arrive-->do somework-->end.

So I'm uncertain if my comment pertains to the current poll/push model or
if it fits part of future work(seen event batching ).

Have you given any thought to the event archival mechanism and event
replay? This could significantly aid in testing and recovery of workflow
and testing new functionality with events by just replay the events. The
archival mechanism I am referring to is similar to today in AWS we have
Event Bridge Archive and Replay.

Regards,
Pavan

On Thu, Aug 1, 2024 at 1:29 AM Kaxil Naik <kaxiln...@gmail.com> wrote:

> I actually did manage to take a look, thanks for the work. I am +1 on the
> poll-based approach -- left a comment on the push-based: I am not sure of
> why we need a function since create asset event API endpoint should have
> all info needed for what the Asset was.
>
> On Thu, 1 Aug 2024 at 01:14, Kaxil Naik <kaxiln...@gmail.com> wrote:
>
> > Thanks Vincent, I will take a look again tomorrow.
> >
> > On Tue, 30 Jul 2024 at 18:47, Vincent Beck <vincb...@apache.org> wrote:
> >
> >> Hi everyone,
> >>
> >> I updated the AIP-82 given the different comments and concerns I
> >> received. I also tried to reply to all comments individually. I would
> >> really appreciate if you can do a second pass and let me know what you
> >> think. Overall, this is what I changed in the AIP:
> >>
> >> - Push based event-driven scheduling. I updated this section entirely
> >> because I received many concerns about the previous proposal. The
> overall
> >> idea now is to leverage the create asset event API endpoint to send
> >> notifications from external (e.g. cloud provider) to Airflow
> environment.
> >>
> >> - I updated the poll based event-driven scheduling DAG author experience
> >> to use a message queue scenario. I understood that this is probably the
> >> main use case we are trying to cover with this AIP, thus I used it as
> >> example and mentioned it multiple times across the AIP.
> >>
> >> Thanks again for your time :)
> >>
> >>
> >>
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-82+External+event+driven+scheduling+in+Airflow
> >>
> >> Vincent
> >>
> >> On 2024/07/29 15:58:23 Vincent Beck wrote:
> >> > Thanks a lot all for the comments, this is very much appreciated! I
> >> received many comments from this thread and in confluence, thanks again.
> >> I'll try to address them all in the AIP and will send an email in this
> >> thread once done. I will most likely revisit the push-based approach
> given
> >> the number of concerns I received, thanks Jarek for proposing another
> >> solution, I'll probably go down that path.
> >> >
> >> > One follow-up question Vikram.
> >> >
> >> > > The bespoke triggerer approach completely makes sense for the long
> >> tail here, but can we do better for the 20% of scenarios which cover
> well
> >> over 80% of usage here is the question in my mind. Or, are you thinking
> of
> >> those as being covered in the "push" model?
> >> >
> >> > Could you share more details about what is this "20% of scenarios
> which
> >> cover well over 80% of usage" please?
> >> >
> >> > Vincent
> >> >
> >> > On 2024/07/29 15:37:50 Kaxil Naik wrote:
> >> > > Thanks Vincent for driving these, I have added my comments to the
> AIP
> >> too.
> >> > >
> >> > > Regards,
> >> > > Kaxil
> >> > >
> >> > > On Fri, 26 Jul 2024 at 20:16, Scheffler Jens (XC-AS/EAE-ADA-T)
> >> > > <jens.scheff...@de.bosch.com.invalid> wrote:
> >> > >
> >> > > > +1 on the comments of Vikram and Jarek, added main points on
> >> confluence
> >> > > >
> >> > > > Sent from Outlook for iOS<https://aka.ms/o0ukef>
> >> > > > ________________________________
> >> > > > From: Vikram Koka <vik...@astronomer.io.INVALID>
> >> > > > Sent: Friday, July 26, 2024 8:46:55 PM
> >> > > > To: dev@airflow.apache.org <dev@airflow.apache.org>
> >> > > > Subject: Re: [DISCUSS] External event driven scheduling in Airflow
> >> > > >
> >> > > > Vincent,
> >> > > >
> >> > > > Thanks for writing this up. The overview looks really good!
> >> > > >
> >> > > > I will leave my comments in the AIP as well, but at a high level
> >> they are
> >> > > > both relatively focused on the "how", rather than the "what".
> >> > > > With respect to the pull / polling approach, I completely agree
> >> that some
> >> > > > incarnation of this is needed.
> >> > > > I am less certain as to how on this part. The bespoke triggerer
> >> approach
> >> > > > completely makes sense for the long tail here, but can we do
> better
> >> for the
> >> > > > 20% of scenarios which cover well over 80% of usage here is the
> >> question in
> >> > > > my mind. Or, are you thinking of those as being covered in the
> >> "push"
> >> > > > model?
> >> > > >
> >> > > > Which leads to the "push" model approach.
> >> > > > I am struggling with the same question that Jarek raised here
> about
> >> whether
> >> > > > we need a new Airflow entity over and beyond the existing REST API
> >> for the
> >> > > > same.
> >> > > > I am concerned about this becoming a vector of attack on Airflow.
> >> > > > I see that this is a hot topic of discussion in the Confluence doc
> >> as well,
> >> > > > but wanted to summarize here as well, so it didn't get lost in the
> >> threads
> >> > > > of comments.
> >> > > >
> >> > > > Best regards,
> >> > > > Vikram
> >> > > >
> >> > > >
> >> > > > On Fri, Jul 26, 2024 at 5:16 AM Jarek Potiuk <ja...@potiuk.com>
> >> wrote:
> >> > > >
> >> > > > > Thanks Vincent. I took a look and I have a general comment. I
> >> > > > > strongly think external driven scheduling is really needed -
> >> especially,
> >> > > > it
> >> > > > > should be much easier for a user to "plug-in" such an external
> >> event to
> >> > > > > Airflow. And there are two parts of it - as correctly stated
> >> there - pull
> >> > > > > and push.
> >> > > > >
> >> > > > > For the pull - I think it would be great to have a kind of
> >> specialized
> >> > > > > Triggers that will be started when DAG is parsed - and those
> >> Triggers
> >> > > > could
> >> > > > > generate the events for DAGs. I think basically that's all that
> is
> >> > > > needed,
> >> > > > > for example I imagine a pubsub trigger that will subscribe to
> >> messages
> >> > > > > coming on the pubsub queue and fire "Asset" event when a message
> >> is
> >> > > > > received. Not much controversy there - I am not sure about the
> >> polling
> >> > > > > thing , because I've always believed that when "asyncio-native"
> >> Trigger
> >> > > > is
> >> > > > > run in the asyncio event loop, we do not "poll" every second or
> >> so (but
> >> > > > > maybe this is just coming from some specific triggers  that
> >> actually do
> >> > > > > such regular poll. But yes - there are polls  like running
> select
> >> on the
> >> > > > DB
> >> > > > > that cannot be easily "async-ed" so having a configurable
> polling
> >> time
> >> > > > > would be good there (but I am not sure maybe it's even possible
> >> today). I
> >> > > > > think this would be really great if we have that option, because
> >> it makes
> >> > > > > it much easier to set up the authorization for Airlfow users -
> >> rather
> >> > > > than
> >> > > > > setting up authorization and REST calls coming from an external
> >> system,
> >> > > > we
> >> > > > > can utilize Connections of Airlfow to authorize such a Trigger
> to
> >> > > > subscribe
> >> > > > > to events.
> >> > > > >
> >> > > > > For the push proposal -  as I read the proposal, the main point
> >> behind it
> >> > > > > is rather than users having to write "Airflow" way of triggering
> >> events
> >> > > > and
> >> > > > > configuring authentication (using REST API) to generate asset
> >> events, is
> >> > > > to
> >> > > > > make Airflow natively understand external ways of pushing - and
> >> > > > effectively
> >> > > > > authorizing and mapping such incoming unauthorized requests into
> >> event
> >> > > > that
> >> > > > > could be generated by an API REST call.
> >> > > > > I am not really sure honestly if this is something that we want
> as
> >> > > > > "running" in airlfow as an endpoint. I'd say such an
> unauthorised
> >> > > > endpoint
> >> > > > > is probably not a good idea - for a variety of reasons, mostly
> >> security.
> >> > > > > And as I understand the goal is that users can easily point at
> >> > > > "3rd-party"
> >> > > > > notification to Airflow and get the event generated.
> >> > > > >
> >> > > > > My feeling is that while this is needed - it should be
> >> externalised from
> >> > > > > airlfow webserver. The authorization has to be set up anyway
> >> > > > additionally -
> >> > > > > unlike in "poll" case - we cannot use Connections for
> authorizing
> >> > > > (because
> >> > > > > it's not Airlfow that authorizes in an external system - it's
> the
> >> other
> >> > > > way
> >> > > > > round). So we have to anyhow setup "something extra" in Airflow
> to
> >> > > > > authorize the external system. Which could be what we have now -
> >> user
> >> > > > that
> >> > > > > allows us to trigger the event. Which means that our REST API
> >> could
> >> > > > > potentially be used the same way it is now, but we will need
> >> "something"
> >> > > > > (library, lambda function etc.) that users could easily setup in
> >> the
> >> > > > > external system to map whatever trigger they generate natively
> >> (say S3
> >> > > > file
> >> > > > > created) to Airflow REST API.
> >> > > > >
> >> > > > > As I see it - this is quite often used (and very practical, that
> >> you
> >> > > > deploy
> >> > > > > a cloud function or lambda that subscribes on the event received
> >> when
> >> > > > > S3/GCS is created. So it would be on the user to deploy such a
> >> lambda -
> >> > > > but
> >> > > > > we **could** provide a library of those: say s3 lambda, gcp
> cloud
> >> > > > function
> >> > > > > in respective providers - with documentation how to set them up,
> >> and how
> >> > > > to
> >> > > > > configure authorization and we would be generally "done". I am
> >> just not
> >> > > > > sure if we need a new entity in Airflow for that (Event
> >> receiver). It
> >> > > > feels
> >> > > > > like it asks Airflow to take more responsibility, when we all
> >> think on
> >> > > > what
> >> > > > > to "remove" from Airflow rather than "add" to it - especially
> >> when it
> >> > > > comes
> >> > > > > to external integrations. It feels to me that Airflow should
> make
> >> it easy
> >> > > > > to be triggered by such an external system and make it easy to
> >> "map" to
> >> > > > the
> >> > > > > way we expect to get events triggered, but this should be done
> >> outside of
> >> > > > > Airflow. If the users can easily find in our docs when they
> >> search "what
> >> > > > do
> >> > > > > I do to externally trigger Airflow on S3 change": either a)
> >> configure
> >> > > > > polling in airflow using s3 Connection, or b) "create a user +
> >> deploy
> >> > > > this
> >> > > > > lambda with those parameters"  - that is "easy enough" and very
> >> practical
> >> > > > > as well.
> >> > > > >
> >> > > > > But maybe I am not seeing the whole picture and the real problem
> >> it's
> >> > > > > solving - so take it as a "first review pass" and "guts
> feeling".
> >> > > > >
> >> > > > > J.
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > On Thu, Jul 25, 2024 at 10:55 PM Beck, Vincent
> >> > > > <vincb...@amazon.com.invalid
> >> > > > > >
> >> > > > > wrote:
> >> > > > >
> >> > > > > > Hello everyone,
> >> > > > > >
> >> > > > > > I created a draft AIP regarding "External event driven
> >> scheduling in
> >> > > > > > Airflow". This proposal is about adding capability in Airflow
> to
> >> > > > schedule
> >> > > > > > DAGs based on external events. Here are some examples of such
> >> external
> >> > > > > > events:
> >> > > > > > - A user signs up to one of the user pool defined in my cloud
> >> provider
> >> > > > > > - One of the databases used in my company has been updated
> >> > > > > > - A job in my cloud provider has been executed successfully
> >> > > > > >
> >> > > > > > The intent of this AIP is to leverage datasets (which will be
> >> soon
> >> > > > > assets)
> >> > > > > > and update them based on external events. I would like to
> >> propose this
> >> > > > > AIP
> >> > > > > > for discussion and more importantly, hear some feedbacks from
> >> you :)
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >>
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FAIRFLOW%2FAIP-82%2BExternal%2Bevent%2Bdriven%2Bscheduling%2Bin%2BAirflow&data=05%7C02%7CJens.Scheffler%40de.bosch.com%7C9e55ef9af31e4a669ef108dcada3a726%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638576165598178951%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=3FFvhCI6RA6sPhZoiOBAqzgyTkC6NNYqJYjBRVqEmUY%3D&reserved=0
> >> > > > <
> >> > > >
> >>
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-82+External+event+driven+scheduling+in+Airflow
> >> > > > >
> >> > > > > >
> >> > > > > > Vincent
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> >> > For additional commands, e-mail: dev-h...@airflow.apache.org
> >> >
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> >> For additional commands, e-mail: dev-h...@airflow.apache.org
> >>
> >>
>

Reply via email to