I actually did manage to take a look, thanks for the work. I am +1 on the
poll-based approach -- left a comment on the push-based: I am not sure of
why we need a function since create asset event API endpoint should have
all info needed for what the Asset was.

On Thu, 1 Aug 2024 at 01:14, Kaxil Naik <kaxiln...@gmail.com> wrote:

> Thanks Vincent, I will take a look again tomorrow.
>
> On Tue, 30 Jul 2024 at 18:47, Vincent Beck <vincb...@apache.org> wrote:
>
>> Hi everyone,
>>
>> I updated the AIP-82 given the different comments and concerns I
>> received. I also tried to reply to all comments individually. I would
>> really appreciate if you can do a second pass and let me know what you
>> think. Overall, this is what I changed in the AIP:
>>
>> - Push based event-driven scheduling. I updated this section entirely
>> because I received many concerns about the previous proposal. The overall
>> idea now is to leverage the create asset event API endpoint to send
>> notifications from external (e.g. cloud provider) to Airflow environment.
>>
>> - I updated the poll based event-driven scheduling DAG author experience
>> to use a message queue scenario. I understood that this is probably the
>> main use case we are trying to cover with this AIP, thus I used it as
>> example and mentioned it multiple times across the AIP.
>>
>> Thanks again for your time :)
>>
>>
>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-82+External+event+driven+scheduling+in+Airflow
>>
>> Vincent
>>
>> On 2024/07/29 15:58:23 Vincent Beck wrote:
>> > Thanks a lot all for the comments, this is very much appreciated! I
>> received many comments from this thread and in confluence, thanks again.
>> I'll try to address them all in the AIP and will send an email in this
>> thread once done. I will most likely revisit the push-based approach given
>> the number of concerns I received, thanks Jarek for proposing another
>> solution, I'll probably go down that path.
>> >
>> > One follow-up question Vikram.
>> >
>> > > The bespoke triggerer approach completely makes sense for the long
>> tail here, but can we do better for the 20% of scenarios which cover well
>> over 80% of usage here is the question in my mind. Or, are you thinking of
>> those as being covered in the "push" model?
>> >
>> > Could you share more details about what is this "20% of scenarios which
>> cover well over 80% of usage" please?
>> >
>> > Vincent
>> >
>> > On 2024/07/29 15:37:50 Kaxil Naik wrote:
>> > > Thanks Vincent for driving these, I have added my comments to the AIP
>> too.
>> > >
>> > > Regards,
>> > > Kaxil
>> > >
>> > > On Fri, 26 Jul 2024 at 20:16, Scheffler Jens (XC-AS/EAE-ADA-T)
>> > > <jens.scheff...@de.bosch.com.invalid> wrote:
>> > >
>> > > > +1 on the comments of Vikram and Jarek, added main points on
>> confluence
>> > > >
>> > > > Sent from Outlook for iOS<https://aka.ms/o0ukef>
>> > > > ________________________________
>> > > > From: Vikram Koka <vik...@astronomer.io.INVALID>
>> > > > Sent: Friday, July 26, 2024 8:46:55 PM
>> > > > To: dev@airflow.apache.org <dev@airflow.apache.org>
>> > > > Subject: Re: [DISCUSS] External event driven scheduling in Airflow
>> > > >
>> > > > Vincent,
>> > > >
>> > > > Thanks for writing this up. The overview looks really good!
>> > > >
>> > > > I will leave my comments in the AIP as well, but at a high level
>> they are
>> > > > both relatively focused on the "how", rather than the "what".
>> > > > With respect to the pull / polling approach, I completely agree
>> that some
>> > > > incarnation of this is needed.
>> > > > I am less certain as to how on this part. The bespoke triggerer
>> approach
>> > > > completely makes sense for the long tail here, but can we do better
>> for the
>> > > > 20% of scenarios which cover well over 80% of usage here is the
>> question in
>> > > > my mind. Or, are you thinking of those as being covered in the
>> "push"
>> > > > model?
>> > > >
>> > > > Which leads to the "push" model approach.
>> > > > I am struggling with the same question that Jarek raised here about
>> whether
>> > > > we need a new Airflow entity over and beyond the existing REST API
>> for the
>> > > > same.
>> > > > I am concerned about this becoming a vector of attack on Airflow.
>> > > > I see that this is a hot topic of discussion in the Confluence doc
>> as well,
>> > > > but wanted to summarize here as well, so it didn't get lost in the
>> threads
>> > > > of comments.
>> > > >
>> > > > Best regards,
>> > > > Vikram
>> > > >
>> > > >
>> > > > On Fri, Jul 26, 2024 at 5:16 AM Jarek Potiuk <ja...@potiuk.com>
>> wrote:
>> > > >
>> > > > > Thanks Vincent. I took a look and I have a general comment. I
>> > > > > strongly think external driven scheduling is really needed -
>> especially,
>> > > > it
>> > > > > should be much easier for a user to "plug-in" such an external
>> event to
>> > > > > Airflow. And there are two parts of it - as correctly stated
>> there - pull
>> > > > > and push.
>> > > > >
>> > > > > For the pull - I think it would be great to have a kind of
>> specialized
>> > > > > Triggers that will be started when DAG is parsed - and those
>> Triggers
>> > > > could
>> > > > > generate the events for DAGs. I think basically that's all that is
>> > > > needed,
>> > > > > for example I imagine a pubsub trigger that will subscribe to
>> messages
>> > > > > coming on the pubsub queue and fire "Asset" event when a message
>> is
>> > > > > received. Not much controversy there - I am not sure about the
>> polling
>> > > > > thing , because I've always believed that when "asyncio-native"
>> Trigger
>> > > > is
>> > > > > run in the asyncio event loop, we do not "poll" every second or
>> so (but
>> > > > > maybe this is just coming from some specific triggers  that
>> actually do
>> > > > > such regular poll. But yes - there are polls  like running select
>> on the
>> > > > DB
>> > > > > that cannot be easily "async-ed" so having a configurable polling
>> time
>> > > > > would be good there (but I am not sure maybe it's even possible
>> today). I
>> > > > > think this would be really great if we have that option, because
>> it makes
>> > > > > it much easier to set up the authorization for Airlfow users -
>> rather
>> > > > than
>> > > > > setting up authorization and REST calls coming from an external
>> system,
>> > > > we
>> > > > > can utilize Connections of Airlfow to authorize such a Trigger to
>> > > > subscribe
>> > > > > to events.
>> > > > >
>> > > > > For the push proposal -  as I read the proposal, the main point
>> behind it
>> > > > > is rather than users having to write "Airflow" way of triggering
>> events
>> > > > and
>> > > > > configuring authentication (using REST API) to generate asset
>> events, is
>> > > > to
>> > > > > make Airflow natively understand external ways of pushing - and
>> > > > effectively
>> > > > > authorizing and mapping such incoming unauthorized requests into
>> event
>> > > > that
>> > > > > could be generated by an API REST call.
>> > > > > I am not really sure honestly if this is something that we want as
>> > > > > "running" in airlfow as an endpoint. I'd say such an unauthorised
>> > > > endpoint
>> > > > > is probably not a good idea - for a variety of reasons, mostly
>> security.
>> > > > > And as I understand the goal is that users can easily point at
>> > > > "3rd-party"
>> > > > > notification to Airflow and get the event generated.
>> > > > >
>> > > > > My feeling is that while this is needed - it should be
>> externalised from
>> > > > > airlfow webserver. The authorization has to be set up anyway
>> > > > additionally -
>> > > > > unlike in "poll" case - we cannot use Connections for authorizing
>> > > > (because
>> > > > > it's not Airlfow that authorizes in an external system - it's the
>> other
>> > > > way
>> > > > > round). So we have to anyhow setup "something extra" in Airflow to
>> > > > > authorize the external system. Which could be what we have now -
>> user
>> > > > that
>> > > > > allows us to trigger the event. Which means that our REST API
>> could
>> > > > > potentially be used the same way it is now, but we will need
>> "something"
>> > > > > (library, lambda function etc.) that users could easily setup in
>> the
>> > > > > external system to map whatever trigger they generate natively
>> (say S3
>> > > > file
>> > > > > created) to Airflow REST API.
>> > > > >
>> > > > > As I see it - this is quite often used (and very practical, that
>> you
>> > > > deploy
>> > > > > a cloud function or lambda that subscribes on the event received
>> when
>> > > > > S3/GCS is created. So it would be on the user to deploy such a
>> lambda -
>> > > > but
>> > > > > we **could** provide a library of those: say s3 lambda, gcp cloud
>> > > > function
>> > > > > in respective providers - with documentation how to set them up,
>> and how
>> > > > to
>> > > > > configure authorization and we would be generally "done". I am
>> just not
>> > > > > sure if we need a new entity in Airflow for that (Event
>> receiver). It
>> > > > feels
>> > > > > like it asks Airflow to take more responsibility, when we all
>> think on
>> > > > what
>> > > > > to "remove" from Airflow rather than "add" to it - especially
>> when it
>> > > > comes
>> > > > > to external integrations. It feels to me that Airflow should make
>> it easy
>> > > > > to be triggered by such an external system and make it easy to
>> "map" to
>> > > > the
>> > > > > way we expect to get events triggered, but this should be done
>> outside of
>> > > > > Airflow. If the users can easily find in our docs when they
>> search "what
>> > > > do
>> > > > > I do to externally trigger Airflow on S3 change": either a)
>> configure
>> > > > > polling in airflow using s3 Connection, or b) "create a user +
>> deploy
>> > > > this
>> > > > > lambda with those parameters"  - that is "easy enough" and very
>> practical
>> > > > > as well.
>> > > > >
>> > > > > But maybe I am not seeing the whole picture and the real problem
>> it's
>> > > > > solving - so take it as a "first review pass" and "guts feeling".
>> > > > >
>> > > > > J.
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Thu, Jul 25, 2024 at 10:55 PM Beck, Vincent
>> > > > <vincb...@amazon.com.invalid
>> > > > > >
>> > > > > wrote:
>> > > > >
>> > > > > > Hello everyone,
>> > > > > >
>> > > > > > I created a draft AIP regarding "External event driven
>> scheduling in
>> > > > > > Airflow". This proposal is about adding capability in Airflow to
>> > > > schedule
>> > > > > > DAGs based on external events. Here are some examples of such
>> external
>> > > > > > events:
>> > > > > > - A user signs up to one of the user pool defined in my cloud
>> provider
>> > > > > > - One of the databases used in my company has been updated
>> > > > > > - A job in my cloud provider has been executed successfully
>> > > > > >
>> > > > > > The intent of this AIP is to leverage datasets (which will be
>> soon
>> > > > > assets)
>> > > > > > and update them based on external events. I would like to
>> propose this
>> > > > > AIP
>> > > > > > for discussion and more importantly, hear some feedbacks from
>> you :)
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > >
>> > > >
>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FAIRFLOW%2FAIP-82%2BExternal%2Bevent%2Bdriven%2Bscheduling%2Bin%2BAirflow&data=05%7C02%7CJens.Scheffler%40de.bosch.com%7C9e55ef9af31e4a669ef108dcada3a726%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638576165598178951%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=3FFvhCI6RA6sPhZoiOBAqzgyTkC6NNYqJYjBRVqEmUY%3D&reserved=0
>> > > > <
>> > > >
>> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-82+External+event+driven+scheduling+in+Airflow
>> > > > >
>> > > > > >
>> > > > > > Vincent
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>> > For additional commands, e-mail: dev-h...@airflow.apache.org
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
>> For additional commands, e-mail: dev-h...@airflow.apache.org
>>
>>

Reply via email to