I agree with both of you that it is indeed a good idea and that it can be
added in Future work -- doesn't need to be part of this AIP.

Thanks for the interest. I was not aware of such feature and this looks
> really cool! I definitely think that can be useful for Airflow, especially
> for testing when you can easily replay events received in the past.
> However, I do not think it should be part of the AIP and, as you mentioned,
> if should be a future work or a follow-up item of the AIP. Please let me
> know if you (or anyone) disagree with this and we can talk about it.
> Otherwise I'll update the future work section of the AIP and mention this
> archive and replay feature.


On Thu, 1 Aug 2024 at 16:11, Vincent Beck <vincb...@apache.org> wrote:

> Hey Pavan,
>
> Thanks for the interest. I was not aware of such feature and this looks
> really cool! I definitely think that can be useful for Airflow, especially
> for testing when you can easily replay events received in the past.
> However, I do not think it should be part of the AIP and, as you mentioned,
> if should be a future work or a follow-up item of the AIP. Please let me
> know if you (or anyone) disagree with this and we can talk about it.
> Otherwise I'll update the future work section of the AIP and mention this
> archive and replay feature.
>
> On 2024/08/01 01:21:58 Pavankumar Gopidesu wrote:
> > Thanks Vincent, I took a look , this is really good. Don't have access to
> > the confluence page to comment :) so adding it here.
> >
> > As events arrive-->do somework-->end.
> >
> > So I'm uncertain if my comment pertains to the current poll/push model or
> > if it fits part of future work(seen event batching ).
> >
> > Have you given any thought to the event archival mechanism and event
> > replay? This could significantly aid in testing and recovery of workflow
> > and testing new functionality with events by just replay the events. The
> > archival mechanism I am referring to is similar to today in AWS we have
> > Event Bridge Archive and Replay.
> >
> > Regards,
> > Pavan
> >
> > On Thu, Aug 1, 2024 at 1:29 AM Kaxil Naik <kaxiln...@gmail.com> wrote:
> >
> > > I actually did manage to take a look, thanks for the work. I am +1 on
> the
> > > poll-based approach -- left a comment on the push-based: I am not sure
> of
> > > why we need a function since create asset event API endpoint should
> have
> > > all info needed for what the Asset was.
> > >
> > > On Thu, 1 Aug 2024 at 01:14, Kaxil Naik <kaxiln...@gmail.com> wrote:
> > >
> > > > Thanks Vincent, I will take a look again tomorrow.
> > > >
> > > > On Tue, 30 Jul 2024 at 18:47, Vincent Beck <vincb...@apache.org>
> wrote:
> > > >
> > > >> Hi everyone,
> > > >>
> > > >> I updated the AIP-82 given the different comments and concerns I
> > > >> received. I also tried to reply to all comments individually. I
> would
> > > >> really appreciate if you can do a second pass and let me know what
> you
> > > >> think. Overall, this is what I changed in the AIP:
> > > >>
> > > >> - Push based event-driven scheduling. I updated this section
> entirely
> > > >> because I received many concerns about the previous proposal. The
> > > overall
> > > >> idea now is to leverage the create asset event API endpoint to send
> > > >> notifications from external (e.g. cloud provider) to Airflow
> > > environment.
> > > >>
> > > >> - I updated the poll based event-driven scheduling DAG author
> experience
> > > >> to use a message queue scenario. I understood that this is probably
> the
> > > >> main use case we are trying to cover with this AIP, thus I used it
> as
> > > >> example and mentioned it multiple times across the AIP.
> > > >>
> > > >> Thanks again for your time :)
> > > >>
> > > >>
> > > >>
> > >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-82+External+event+driven+scheduling+in+Airflow
> > > >>
> > > >> Vincent
> > > >>
> > > >> On 2024/07/29 15:58:23 Vincent Beck wrote:
> > > >> > Thanks a lot all for the comments, this is very much appreciated!
> I
> > > >> received many comments from this thread and in confluence, thanks
> again.
> > > >> I'll try to address them all in the AIP and will send an email in
> this
> > > >> thread once done. I will most likely revisit the push-based approach
> > > given
> > > >> the number of concerns I received, thanks Jarek for proposing
> another
> > > >> solution, I'll probably go down that path.
> > > >> >
> > > >> > One follow-up question Vikram.
> > > >> >
> > > >> > > The bespoke triggerer approach completely makes sense for the
> long
> > > >> tail here, but can we do better for the 20% of scenarios which cover
> > > well
> > > >> over 80% of usage here is the question in my mind. Or, are you
> thinking
> > > of
> > > >> those as being covered in the "push" model?
> > > >> >
> > > >> > Could you share more details about what is this "20% of scenarios
> > > which
> > > >> cover well over 80% of usage" please?
> > > >> >
> > > >> > Vincent
> > > >> >
> > > >> > On 2024/07/29 15:37:50 Kaxil Naik wrote:
> > > >> > > Thanks Vincent for driving these, I have added my comments to
> the
> > > AIP
> > > >> too.
> > > >> > >
> > > >> > > Regards,
> > > >> > > Kaxil
> > > >> > >
> > > >> > > On Fri, 26 Jul 2024 at 20:16, Scheffler Jens (XC-AS/EAE-ADA-T)
> > > >> > > <jens.scheff...@de.bosch.com.invalid> wrote:
> > > >> > >
> > > >> > > > +1 on the comments of Vikram and Jarek, added main points on
> > > >> confluence
> > > >> > > >
> > > >> > > > Sent from Outlook for iOS<https://aka.ms/o0ukef>
> > > >> > > > ________________________________
> > > >> > > > From: Vikram Koka <vik...@astronomer.io.INVALID>
> > > >> > > > Sent: Friday, July 26, 2024 8:46:55 PM
> > > >> > > > To: dev@airflow.apache.org <dev@airflow.apache.org>
> > > >> > > > Subject: Re: [DISCUSS] External event driven scheduling in
> Airflow
> > > >> > > >
> > > >> > > > Vincent,
> > > >> > > >
> > > >> > > > Thanks for writing this up. The overview looks really good!
> > > >> > > >
> > > >> > > > I will leave my comments in the AIP as well, but at a high
> level
> > > >> they are
> > > >> > > > both relatively focused on the "how", rather than the "what".
> > > >> > > > With respect to the pull / polling approach, I completely
> agree
> > > >> that some
> > > >> > > > incarnation of this is needed.
> > > >> > > > I am less certain as to how on this part. The bespoke
> triggerer
> > > >> approach
> > > >> > > > completely makes sense for the long tail here, but can we do
> > > better
> > > >> for the
> > > >> > > > 20% of scenarios which cover well over 80% of usage here is
> the
> > > >> question in
> > > >> > > > my mind. Or, are you thinking of those as being covered in the
> > > >> "push"
> > > >> > > > model?
> > > >> > > >
> > > >> > > > Which leads to the "push" model approach.
> > > >> > > > I am struggling with the same question that Jarek raised here
> > > about
> > > >> whether
> > > >> > > > we need a new Airflow entity over and beyond the existing
> REST API
> > > >> for the
> > > >> > > > same.
> > > >> > > > I am concerned about this becoming a vector of attack on
> Airflow.
> > > >> > > > I see that this is a hot topic of discussion in the
> Confluence doc
> > > >> as well,
> > > >> > > > but wanted to summarize here as well, so it didn't get lost
> in the
> > > >> threads
> > > >> > > > of comments.
> > > >> > > >
> > > >> > > > Best regards,
> > > >> > > > Vikram
> > > >> > > >
> > > >> > > >
> > > >> > > > On Fri, Jul 26, 2024 at 5:16 AM Jarek Potiuk <
> ja...@potiuk.com>
> > > >> wrote:
> > > >> > > >
> > > >> > > > > Thanks Vincent. I took a look and I have a general comment.
> I
> > > >> > > > > strongly think external driven scheduling is really needed -
> > > >> especially,
> > > >> > > > it
> > > >> > > > > should be much easier for a user to "plug-in" such an
> external
> > > >> event to
> > > >> > > > > Airflow. And there are two parts of it - as correctly stated
> > > >> there - pull
> > > >> > > > > and push.
> > > >> > > > >
> > > >> > > > > For the pull - I think it would be great to have a kind of
> > > >> specialized
> > > >> > > > > Triggers that will be started when DAG is parsed - and those
> > > >> Triggers
> > > >> > > > could
> > > >> > > > > generate the events for DAGs. I think basically that's all
> that
> > > is
> > > >> > > > needed,
> > > >> > > > > for example I imagine a pubsub trigger that will subscribe
> to
> > > >> messages
> > > >> > > > > coming on the pubsub queue and fire "Asset" event when a
> message
> > > >> is
> > > >> > > > > received. Not much controversy there - I am not sure about
> the
> > > >> polling
> > > >> > > > > thing , because I've always believed that when
> "asyncio-native"
> > > >> Trigger
> > > >> > > > is
> > > >> > > > > run in the asyncio event loop, we do not "poll" every
> second or
> > > >> so (but
> > > >> > > > > maybe this is just coming from some specific triggers  that
> > > >> actually do
> > > >> > > > > such regular poll. But yes - there are polls  like running
> > > select
> > > >> on the
> > > >> > > > DB
> > > >> > > > > that cannot be easily "async-ed" so having a configurable
> > > polling
> > > >> time
> > > >> > > > > would be good there (but I am not sure maybe it's even
> possible
> > > >> today). I
> > > >> > > > > think this would be really great if we have that option,
> because
> > > >> it makes
> > > >> > > > > it much easier to set up the authorization for Airlfow
> users -
> > > >> rather
> > > >> > > > than
> > > >> > > > > setting up authorization and REST calls coming from an
> external
> > > >> system,
> > > >> > > > we
> > > >> > > > > can utilize Connections of Airlfow to authorize such a
> Trigger
> > > to
> > > >> > > > subscribe
> > > >> > > > > to events.
> > > >> > > > >
> > > >> > > > > For the push proposal -  as I read the proposal, the main
> point
> > > >> behind it
> > > >> > > > > is rather than users having to write "Airflow" way of
> triggering
> > > >> events
> > > >> > > > and
> > > >> > > > > configuring authentication (using REST API) to generate
> asset
> > > >> events, is
> > > >> > > > to
> > > >> > > > > make Airflow natively understand external ways of pushing -
> and
> > > >> > > > effectively
> > > >> > > > > authorizing and mapping such incoming unauthorized requests
> into
> > > >> event
> > > >> > > > that
> > > >> > > > > could be generated by an API REST call.
> > > >> > > > > I am not really sure honestly if this is something that we
> want
> > > as
> > > >> > > > > "running" in airlfow as an endpoint. I'd say such an
> > > unauthorised
> > > >> > > > endpoint
> > > >> > > > > is probably not a good idea - for a variety of reasons,
> mostly
> > > >> security.
> > > >> > > > > And as I understand the goal is that users can easily point
> at
> > > >> > > > "3rd-party"
> > > >> > > > > notification to Airflow and get the event generated.
> > > >> > > > >
> > > >> > > > > My feeling is that while this is needed - it should be
> > > >> externalised from
> > > >> > > > > airlfow webserver. The authorization has to be set up anyway
> > > >> > > > additionally -
> > > >> > > > > unlike in "poll" case - we cannot use Connections for
> > > authorizing
> > > >> > > > (because
> > > >> > > > > it's not Airlfow that authorizes in an external system -
> it's
> > > the
> > > >> other
> > > >> > > > way
> > > >> > > > > round). So we have to anyhow setup "something extra" in
> Airflow
> > > to
> > > >> > > > > authorize the external system. Which could be what we have
> now -
> > > >> user
> > > >> > > > that
> > > >> > > > > allows us to trigger the event. Which means that our REST
> API
> > > >> could
> > > >> > > > > potentially be used the same way it is now, but we will need
> > > >> "something"
> > > >> > > > > (library, lambda function etc.) that users could easily
> setup in
> > > >> the
> > > >> > > > > external system to map whatever trigger they generate
> natively
> > > >> (say S3
> > > >> > > > file
> > > >> > > > > created) to Airflow REST API.
> > > >> > > > >
> > > >> > > > > As I see it - this is quite often used (and very practical,
> that
> > > >> you
> > > >> > > > deploy
> > > >> > > > > a cloud function or lambda that subscribes on the event
> received
> > > >> when
> > > >> > > > > S3/GCS is created. So it would be on the user to deploy
> such a
> > > >> lambda -
> > > >> > > > but
> > > >> > > > > we **could** provide a library of those: say s3 lambda, gcp
> > > cloud
> > > >> > > > function
> > > >> > > > > in respective providers - with documentation how to set
> them up,
> > > >> and how
> > > >> > > > to
> > > >> > > > > configure authorization and we would be generally "done". I
> am
> > > >> just not
> > > >> > > > > sure if we need a new entity in Airflow for that (Event
> > > >> receiver). It
> > > >> > > > feels
> > > >> > > > > like it asks Airflow to take more responsibility, when we
> all
> > > >> think on
> > > >> > > > what
> > > >> > > > > to "remove" from Airflow rather than "add" to it -
> especially
> > > >> when it
> > > >> > > > comes
> > > >> > > > > to external integrations. It feels to me that Airflow should
> > > make
> > > >> it easy
> > > >> > > > > to be triggered by such an external system and make it easy
> to
> > > >> "map" to
> > > >> > > > the
> > > >> > > > > way we expect to get events triggered, but this should be
> done
> > > >> outside of
> > > >> > > > > Airflow. If the users can easily find in our docs when they
> > > >> search "what
> > > >> > > > do
> > > >> > > > > I do to externally trigger Airflow on S3 change": either a)
> > > >> configure
> > > >> > > > > polling in airflow using s3 Connection, or b) "create a
> user +
> > > >> deploy
> > > >> > > > this
> > > >> > > > > lambda with those parameters"  - that is "easy enough" and
> very
> > > >> practical
> > > >> > > > > as well.
> > > >> > > > >
> > > >> > > > > But maybe I am not seeing the whole picture and the real
> problem
> > > >> it's
> > > >> > > > > solving - so take it as a "first review pass" and "guts
> > > feeling".
> > > >> > > > >
> > > >> > > > > J.
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > On Thu, Jul 25, 2024 at 10:55 PM Beck, Vincent
> > > >> > > > <vincb...@amazon.com.invalid
> > > >> > > > > >
> > > >> > > > > wrote:
> > > >> > > > >
> > > >> > > > > > Hello everyone,
> > > >> > > > > >
> > > >> > > > > > I created a draft AIP regarding "External event driven
> > > >> scheduling in
> > > >> > > > > > Airflow". This proposal is about adding capability in
> Airflow
> > > to
> > > >> > > > schedule
> > > >> > > > > > DAGs based on external events. Here are some examples of
> such
> > > >> external
> > > >> > > > > > events:
> > > >> > > > > > - A user signs up to one of the user pool defined in my
> cloud
> > > >> provider
> > > >> > > > > > - One of the databases used in my company has been updated
> > > >> > > > > > - A job in my cloud provider has been executed
> successfully
> > > >> > > > > >
> > > >> > > > > > The intent of this AIP is to leverage datasets (which
> will be
> > > >> soon
> > > >> > > > > assets)
> > > >> > > > > > and update them based on external events. I would like to
> > > >> propose this
> > > >> > > > > AIP
> > > >> > > > > > for discussion and more importantly, hear some feedbacks
> from
> > > >> you :)
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >>
> > >
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FAIRFLOW%2FAIP-82%2BExternal%2Bevent%2Bdriven%2Bscheduling%2Bin%2BAirflow&data=05%7C02%7CJens.Scheffler%40de.bosch.com%7C9e55ef9af31e4a669ef108dcada3a726%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638576165598178951%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=3FFvhCI6RA6sPhZoiOBAqzgyTkC6NNYqJYjBRVqEmUY%3D&reserved=0
> > > >> > > > <
> > > >> > > >
> > > >>
> > >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-82+External+event+driven+scheduling+in+Airflow
> > > >> > > > >
> > > >> > > > > >
> > > >> > > > > > Vincent
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >> >
> ---------------------------------------------------------------------
> > > >> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > > >> > For additional commands, e-mail: dev-h...@airflow.apache.org
> > > >> >
> > > >> >
> > > >>
> > > >>
> ---------------------------------------------------------------------
> > > >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > > >> For additional commands, e-mail: dev-h...@airflow.apache.org
> > > >>
> > > >>
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
>
>

Reply via email to