Thanks Vincent and Kaxil, Agree it can be added to future work. Looking at discussion on event mapping, Today we know the source systems and we are aware of the schemas from source systems (S3, SQS, Event Arc etc;)
However, as new services emerge and their schemas evolve, this may change. Agree with Kaxil has mentioned that transformers are very effective in converting the required format before sending data to target systems. Is it a good idea to define a schema for events to enforce from airflow? There are some open standards, such as CloudEvents[1], that we could consider. This approach would allow us to maintain versions (v1, v2, ... vn) as we evolve, providing utilities that users can easily plug into and send events to Airflow. It would also simplify maintenance within the Airflow system to handle incoming events because we get a known version of the schema from user request. If this is already being addressed, please disregard my comments. [1]: https://github.com/cloudevents/spec Regards, Pavan On Fri, Aug 2, 2024 at 1:40 AM Kaxil Naik <kaxiln...@gmail.com> wrote: > Yes but the big difference is you will create a single user for EventBridge > since that is the one sending request to Airflow, single user for EventArc > for GCP and 1 user for every other EventListener or application ---- as > compared to 1 user per type of Payload (since Airflow will need to > understand the payload of the original source). So in that case, you will > have 1 user,function mapping for S3, 1user,function mapping for Redshift, > and it goes on. The former approach is also consistent with our Connection > model, where we have one standardized AWS & GCP connection that works for > most, if not all, services. > > but the user will have to be added anyway (some kind of > > service account - because the API needs to be authorized - that part is > not > > changed). (unless of course you want to use the same user for all kinds > of > > external interfaces, which for security point of view is a very bad idea > - > > each external system should have their own "service account" - that 's > the > > best practices from the security point of view. > > > On Fri, 2 Aug 2024 at 01:22, Jarek Potiuk <ja...@potiuk.com> wrote: > > > I am all for it - if we want to stick to event bridge or similar and > > recommend it to our users, it's perfectly fine for me, It would be great > > however to add a documentation explaining the steps and some examples - > > ideally for most of our providers and "standard" ways of triggering such > an > > event. This is even what I proposed originally whent the first version of > > the document was created ( to just document how to map the events > > externally). > > > > BTW. Yes - in this case you need to implement the logic in the > > event bridge, but the user will have to be added anyway (some kind of > > service account - because the API needs to be authorized - that part is > not > > changed). (unless of course you want to use the same user for all kinds > of > > external interfaces, which for security point of view is a very bad idea > - > > each external system should have their own "service account" - that 's > the > > best practices from the security point of view. > > > > J, > > > > On Fri, Aug 2, 2024 at 2:01 AM Kaxil Naik <kaxiln...@gmail.com> wrote: > > > > > I was discussing this with Vincent. In either case, same as now or the > > one > > > proposed in the AIP, a user will have to use something like AWS > > > EventBridge[1] or GCS EventArc [2] where users will consume the event > > from > > > object storage (S3 object creation for example), and then they will > have > > to > > > add the Airflow's Create Dataset endpoint to EventBridge [3]. Now, if > you > > > just customize the payload to build the URI which is allowed (either > via > > > GET / POST) in eventbridge, it works right now. However, with the > current > > > proposal: a user will have to create a new user in Airflow and some > > mapping > > > to a function (that is either in the provider or a new user-defined > > > function) that can understand this specific payload, in this example > the > > > payload for S3 events. This will become huge because this means that > for > > > each payload, we will have to provide a new function and keep it > updated. > > > From users POV, they will need to create a new user every time for a > new > > > service (S3, Redshift, SNS, Bedrock etc). This will again likely have > to > > go > > > to the Auth manager backend. Compared to what's available today -- i.e. > > > building a URI & extra metadata that can not only work with EventBridge > > or > > > Eventarc but by any service. > > > > > > Since we already have to use things like EventBridge or EventArc for > > > managed service providers to transform the event, it fits well with the > > > existing approach. AWS Blog [3] even has a similar example for Datadog > > > where they use input transformer "{"detail":"$.detail"}" before sending > > it > > > to Datadog's API. > > > >"Having producer of an event generating even in their standard way, > make > > > it easy for airflow to consume it as a dataset event without external > > > entities": > > > > > > > > > [1]: https://aws.amazon.com/eventbridge/ | > > > https://docs.aws.amazon.com/AmazonS3/latest/userguide/EventBridge.html > > > [2]: https://cloud.google.com/eventarc/docs > > > [3]: > > > > > > > > > https://aws.amazon.com/blogs/compute/using-api-destinations-with-amazon-eventbridge/ > > > > > > On Fri, 2 Aug 2024 at 00:14, Jarek Potiuk <ja...@potiuk.com> wrote: > > > > > > > I proposed the mapping - because it's the easiest way (I think) to > map > > > > between the "native" source to "airflow" target expectations. There > are > > > > many producers of such events, and Airflow is the consumer. And it > > seems > > > > appropriate to have a way for our users to easily plug events > produced > > > from > > > > one system into our "events" API - without having to employ external > > > > "mapper" (say lambda) doing the conversion. While I think it is > indeed > > "a > > > > bit odd", it's a solution that might leverage most of what we have - > > > > authorisation and API exposure via "user" in API. > > > > > > > > While I - myself - find it it a bit unusual, I think it might do the > > job, > > > > But I wonder if there is any alternative solution to the problem of > > > "Having > > > > producer of an event generating even in their standard way, make it > > easy > > > > for airflow to consume it as a dataset event without external > > entities". > > > > > > > > On Fri, Aug 2, 2024 at 12:57 AM Kaxil Naik <kaxiln...@gmail.com> > > wrote: > > > > > > > > > I would love for VOTE to get started on this one. I think most of > the > > > > > commenters and those who replied to this email are happy with the > > > > proposal > > > > > on the poll-based approach. > > > > > > > > > > Regarding the push-based approach, I am not convinced that the > > proposed > > > > > implementation has any gains over what's already available with the > > > > Dataset > > > > > Event Create API; the one user-to-one function mapping is an odd > user > > > > > experience. I'm curious to hear what others think. > > > > > > > > > > On Thu, 1 Aug 2024 at 17:39, Kaxil Naik <kaxiln...@gmail.com> > wrote: > > > > > > > > > > > I agree with both of you that it is indeed a good idea and that > it > > > can > > > > be > > > > > > added in Future work -- doesn't need to be part of this AIP. > > > > > > > > > > > > Thanks for the interest. I was not aware of such feature and this > > > looks > > > > > >> really cool! I definitely think that can be useful for Airflow, > > > > > especially > > > > > >> for testing when you can easily replay events received in the > > past. > > > > > >> However, I do not think it should be part of the AIP and, as you > > > > > mentioned, > > > > > >> if should be a future work or a follow-up item of the AIP. > Please > > > let > > > > me > > > > > >> know if you (or anyone) disagree with this and we can talk about > > it. > > > > > >> Otherwise I'll update the future work section of the AIP and > > mention > > > > > this > > > > > >> archive and replay feature. > > > > > > > > > > > > > > > > > > On Thu, 1 Aug 2024 at 16:11, Vincent Beck <vincb...@apache.org> > > > wrote: > > > > > > > > > > > >> Hey Pavan, > > > > > >> > > > > > >> Thanks for the interest. I was not aware of such feature and > this > > > > looks > > > > > >> really cool! I definitely think that can be useful for Airflow, > > > > > especially > > > > > >> for testing when you can easily replay events received in the > > past. > > > > > >> However, I do not think it should be part of the AIP and, as you > > > > > mentioned, > > > > > >> if should be a future work or a follow-up item of the AIP. > Please > > > let > > > > me > > > > > >> know if you (or anyone) disagree with this and we can talk about > > it. > > > > > >> Otherwise I'll update the future work section of the AIP and > > mention > > > > > this > > > > > >> archive and replay feature. > > > > > >> > > > > > >> On 2024/08/01 01:21:58 Pavankumar Gopidesu wrote: > > > > > >> > Thanks Vincent, I took a look , this is really good. Don't > have > > > > access > > > > > >> to > > > > > >> > the confluence page to comment :) so adding it here. > > > > > >> > > > > > > >> > As events arrive-->do somework-->end. > > > > > >> > > > > > > >> > So I'm uncertain if my comment pertains to the current > poll/push > > > > model > > > > > >> or > > > > > >> > if it fits part of future work(seen event batching ). > > > > > >> > > > > > > >> > Have you given any thought to the event archival mechanism and > > > event > > > > > >> > replay? This could significantly aid in testing and recovery > of > > > > > workflow > > > > > >> > and testing new functionality with events by just replay the > > > events. > > > > > The > > > > > >> > archival mechanism I am referring to is similar to today in > AWS > > we > > > > > have > > > > > >> > Event Bridge Archive and Replay. > > > > > >> > > > > > > >> > Regards, > > > > > >> > Pavan > > > > > >> > > > > > > >> > On Thu, Aug 1, 2024 at 1:29 AM Kaxil Naik < > kaxiln...@gmail.com> > > > > > wrote: > > > > > >> > > > > > > >> > > I actually did manage to take a look, thanks for the work. I > > am > > > +1 > > > > > on > > > > > >> the > > > > > >> > > poll-based approach -- left a comment on the push-based: I > am > > > not > > > > > >> sure of > > > > > >> > > why we need a function since create asset event API endpoint > > > > should > > > > > >> have > > > > > >> > > all info needed for what the Asset was. > > > > > >> > > > > > > > >> > > On Thu, 1 Aug 2024 at 01:14, Kaxil Naik < > kaxiln...@gmail.com> > > > > > wrote: > > > > > >> > > > > > > > >> > > > Thanks Vincent, I will take a look again tomorrow. > > > > > >> > > > > > > > > >> > > > On Tue, 30 Jul 2024 at 18:47, Vincent Beck < > > > vincb...@apache.org > > > > > > > > > > >> wrote: > > > > > >> > > > > > > > > >> > > >> Hi everyone, > > > > > >> > > >> > > > > > >> > > >> I updated the AIP-82 given the different comments and > > > concerns > > > > I > > > > > >> > > >> received. I also tried to reply to all comments > > > individually. I > > > > > >> would > > > > > >> > > >> really appreciate if you can do a second pass and let me > > know > > > > > what > > > > > >> you > > > > > >> > > >> think. Overall, this is what I changed in the AIP: > > > > > >> > > >> > > > > > >> > > >> - Push based event-driven scheduling. I updated this > > section > > > > > >> entirely > > > > > >> > > >> because I received many concerns about the previous > > proposal. > > > > The > > > > > >> > > overall > > > > > >> > > >> idea now is to leverage the create asset event API > endpoint > > > to > > > > > send > > > > > >> > > >> notifications from external (e.g. cloud provider) to > > Airflow > > > > > >> > > environment. > > > > > >> > > >> > > > > > >> > > >> - I updated the poll based event-driven scheduling DAG > > author > > > > > >> experience > > > > > >> > > >> to use a message queue scenario. I understood that this > is > > > > > >> probably the > > > > > >> > > >> main use case we are trying to cover with this AIP, thus > I > > > used > > > > > it > > > > > >> as > > > > > >> > > >> example and mentioned it multiple times across the AIP. > > > > > >> > > >> > > > > > >> > > >> Thanks again for your time :) > > > > > >> > > >> > > > > > >> > > >> > > > > > >> > > >> > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-82+External+event+driven+scheduling+in+Airflow > > > > > >> > > >> > > > > > >> > > >> Vincent > > > > > >> > > >> > > > > > >> > > >> On 2024/07/29 15:58:23 Vincent Beck wrote: > > > > > >> > > >> > Thanks a lot all for the comments, this is very much > > > > > >> appreciated! I > > > > > >> > > >> received many comments from this thread and in > confluence, > > > > thanks > > > > > >> again. > > > > > >> > > >> I'll try to address them all in the AIP and will send an > > > email > > > > in > > > > > >> this > > > > > >> > > >> thread once done. I will most likely revisit the > push-based > > > > > >> approach > > > > > >> > > given > > > > > >> > > >> the number of concerns I received, thanks Jarek for > > proposing > > > > > >> another > > > > > >> > > >> solution, I'll probably go down that path. > > > > > >> > > >> > > > > > > >> > > >> > One follow-up question Vikram. > > > > > >> > > >> > > > > > > >> > > >> > > The bespoke triggerer approach completely makes sense > > for > > > > the > > > > > >> long > > > > > >> > > >> tail here, but can we do better for the 20% of scenarios > > > which > > > > > >> cover > > > > > >> > > well > > > > > >> > > >> over 80% of usage here is the question in my mind. Or, > are > > > you > > > > > >> thinking > > > > > >> > > of > > > > > >> > > >> those as being covered in the "push" model? > > > > > >> > > >> > > > > > > >> > > >> > Could you share more details about what is this "20% of > > > > > scenarios > > > > > >> > > which > > > > > >> > > >> cover well over 80% of usage" please? > > > > > >> > > >> > > > > > > >> > > >> > Vincent > > > > > >> > > >> > > > > > > >> > > >> > On 2024/07/29 15:37:50 Kaxil Naik wrote: > > > > > >> > > >> > > Thanks Vincent for driving these, I have added my > > > comments > > > > to > > > > > >> the > > > > > >> > > AIP > > > > > >> > > >> too. > > > > > >> > > >> > > > > > > > >> > > >> > > Regards, > > > > > >> > > >> > > Kaxil > > > > > >> > > >> > > > > > > > >> > > >> > > On Fri, 26 Jul 2024 at 20:16, Scheffler Jens > > > > > (XC-AS/EAE-ADA-T) > > > > > >> > > >> > > <jens.scheff...@de.bosch.com.invalid> wrote: > > > > > >> > > >> > > > > > > > >> > > >> > > > +1 on the comments of Vikram and Jarek, added main > > > points > > > > > on > > > > > >> > > >> confluence > > > > > >> > > >> > > > > > > > > >> > > >> > > > Sent from Outlook for iOS<https://aka.ms/o0ukef> > > > > > >> > > >> > > > ________________________________ > > > > > >> > > >> > > > From: Vikram Koka <vik...@astronomer.io.INVALID> > > > > > >> > > >> > > > Sent: Friday, July 26, 2024 8:46:55 PM > > > > > >> > > >> > > > To: dev@airflow.apache.org <dev@airflow.apache.org > > > > > > > >> > > >> > > > Subject: Re: [DISCUSS] External event driven > > scheduling > > > > in > > > > > >> Airflow > > > > > >> > > >> > > > > > > > > >> > > >> > > > Vincent, > > > > > >> > > >> > > > > > > > > >> > > >> > > > Thanks for writing this up. The overview looks > really > > > > good! > > > > > >> > > >> > > > > > > > > >> > > >> > > > I will leave my comments in the AIP as well, but > at a > > > > high > > > > > >> level > > > > > >> > > >> they are > > > > > >> > > >> > > > both relatively focused on the "how", rather than > the > > > > > "what". > > > > > >> > > >> > > > With respect to the pull / polling approach, I > > > completely > > > > > >> agree > > > > > >> > > >> that some > > > > > >> > > >> > > > incarnation of this is needed. > > > > > >> > > >> > > > I am less certain as to how on this part. The > bespoke > > > > > >> triggerer > > > > > >> > > >> approach > > > > > >> > > >> > > > completely makes sense for the long tail here, but > > can > > > we > > > > > do > > > > > >> > > better > > > > > >> > > >> for the > > > > > >> > > >> > > > 20% of scenarios which cover well over 80% of usage > > > here > > > > is > > > > > >> the > > > > > >> > > >> question in > > > > > >> > > >> > > > my mind. Or, are you thinking of those as being > > covered > > > > in > > > > > >> the > > > > > >> > > >> "push" > > > > > >> > > >> > > > model? > > > > > >> > > >> > > > > > > > > >> > > >> > > > Which leads to the "push" model approach. > > > > > >> > > >> > > > I am struggling with the same question that Jarek > > > raised > > > > > here > > > > > >> > > about > > > > > >> > > >> whether > > > > > >> > > >> > > > we need a new Airflow entity over and beyond the > > > existing > > > > > >> REST API > > > > > >> > > >> for the > > > > > >> > > >> > > > same. > > > > > >> > > >> > > > I am concerned about this becoming a vector of > attack > > > on > > > > > >> Airflow. > > > > > >> > > >> > > > I see that this is a hot topic of discussion in the > > > > > >> Confluence doc > > > > > >> > > >> as well, > > > > > >> > > >> > > > but wanted to summarize here as well, so it didn't > > get > > > > lost > > > > > >> in the > > > > > >> > > >> threads > > > > > >> > > >> > > > of comments. > > > > > >> > > >> > > > > > > > > >> > > >> > > > Best regards, > > > > > >> > > >> > > > Vikram > > > > > >> > > >> > > > > > > > > >> > > >> > > > > > > > > >> > > >> > > > On Fri, Jul 26, 2024 at 5:16 AM Jarek Potiuk < > > > > > >> ja...@potiuk.com> > > > > > >> > > >> wrote: > > > > > >> > > >> > > > > > > > > >> > > >> > > > > Thanks Vincent. I took a look and I have a > general > > > > > >> comment. I > > > > > >> > > >> > > > > strongly think external driven scheduling is > really > > > > > needed > > > > > >> - > > > > > >> > > >> especially, > > > > > >> > > >> > > > it > > > > > >> > > >> > > > > should be much easier for a user to "plug-in" > such > > an > > > > > >> external > > > > > >> > > >> event to > > > > > >> > > >> > > > > Airflow. And there are two parts of it - as > > correctly > > > > > >> stated > > > > > >> > > >> there - pull > > > > > >> > > >> > > > > and push. > > > > > >> > > >> > > > > > > > > > >> > > >> > > > > For the pull - I think it would be great to have > a > > > kind > > > > > of > > > > > >> > > >> specialized > > > > > >> > > >> > > > > Triggers that will be started when DAG is parsed > - > > > and > > > > > >> those > > > > > >> > > >> Triggers > > > > > >> > > >> > > > could > > > > > >> > > >> > > > > generate the events for DAGs. I think basically > > > that's > > > > > all > > > > > >> that > > > > > >> > > is > > > > > >> > > >> > > > needed, > > > > > >> > > >> > > > > for example I imagine a pubsub trigger that will > > > > > subscribe > > > > > >> to > > > > > >> > > >> messages > > > > > >> > > >> > > > > coming on the pubsub queue and fire "Asset" event > > > when > > > > a > > > > > >> message > > > > > >> > > >> is > > > > > >> > > >> > > > > received. Not much controversy there - I am not > > sure > > > > > about > > > > > >> the > > > > > >> > > >> polling > > > > > >> > > >> > > > > thing , because I've always believed that when > > > > > >> "asyncio-native" > > > > > >> > > >> Trigger > > > > > >> > > >> > > > is > > > > > >> > > >> > > > > run in the asyncio event loop, we do not "poll" > > every > > > > > >> second or > > > > > >> > > >> so (but > > > > > >> > > >> > > > > maybe this is just coming from some specific > > triggers > > > > > that > > > > > >> > > >> actually do > > > > > >> > > >> > > > > such regular poll. But yes - there are polls > like > > > > > running > > > > > >> > > select > > > > > >> > > >> on the > > > > > >> > > >> > > > DB > > > > > >> > > >> > > > > that cannot be easily "async-ed" so having a > > > > configurable > > > > > >> > > polling > > > > > >> > > >> time > > > > > >> > > >> > > > > would be good there (but I am not sure maybe it's > > > even > > > > > >> possible > > > > > >> > > >> today). I > > > > > >> > > >> > > > > think this would be really great if we have that > > > > option, > > > > > >> because > > > > > >> > > >> it makes > > > > > >> > > >> > > > > it much easier to set up the authorization for > > > Airlfow > > > > > >> users - > > > > > >> > > >> rather > > > > > >> > > >> > > > than > > > > > >> > > >> > > > > setting up authorization and REST calls coming > from > > > an > > > > > >> external > > > > > >> > > >> system, > > > > > >> > > >> > > > we > > > > > >> > > >> > > > > can utilize Connections of Airlfow to authorize > > such > > > a > > > > > >> Trigger > > > > > >> > > to > > > > > >> > > >> > > > subscribe > > > > > >> > > >> > > > > to events. > > > > > >> > > >> > > > > > > > > > >> > > >> > > > > For the push proposal - as I read the proposal, > > the > > > > main > > > > > >> point > > > > > >> > > >> behind it > > > > > >> > > >> > > > > is rather than users having to write "Airflow" > way > > of > > > > > >> triggering > > > > > >> > > >> events > > > > > >> > > >> > > > and > > > > > >> > > >> > > > > configuring authentication (using REST API) to > > > generate > > > > > >> asset > > > > > >> > > >> events, is > > > > > >> > > >> > > > to > > > > > >> > > >> > > > > make Airflow natively understand external ways of > > > > pushing > > > > > >> - and > > > > > >> > > >> > > > effectively > > > > > >> > > >> > > > > authorizing and mapping such incoming > unauthorized > > > > > >> requests into > > > > > >> > > >> event > > > > > >> > > >> > > > that > > > > > >> > > >> > > > > could be generated by an API REST call. > > > > > >> > > >> > > > > I am not really sure honestly if this is > something > > > that > > > > > we > > > > > >> want > > > > > >> > > as > > > > > >> > > >> > > > > "running" in airlfow as an endpoint. I'd say such > > an > > > > > >> > > unauthorised > > > > > >> > > >> > > > endpoint > > > > > >> > > >> > > > > is probably not a good idea - for a variety of > > > reasons, > > > > > >> mostly > > > > > >> > > >> security. > > > > > >> > > >> > > > > And as I understand the goal is that users can > > easily > > > > > >> point at > > > > > >> > > >> > > > "3rd-party" > > > > > >> > > >> > > > > notification to Airflow and get the event > > generated. > > > > > >> > > >> > > > > > > > > > >> > > >> > > > > My feeling is that while this is needed - it > should > > > be > > > > > >> > > >> externalised from > > > > > >> > > >> > > > > airlfow webserver. The authorization has to be > set > > up > > > > > >> anyway > > > > > >> > > >> > > > additionally - > > > > > >> > > >> > > > > unlike in "poll" case - we cannot use Connections > > for > > > > > >> > > authorizing > > > > > >> > > >> > > > (because > > > > > >> > > >> > > > > it's not Airlfow that authorizes in an external > > > system > > > > - > > > > > >> it's > > > > > >> > > the > > > > > >> > > >> other > > > > > >> > > >> > > > way > > > > > >> > > >> > > > > round). So we have to anyhow setup "something > > extra" > > > in > > > > > >> Airflow > > > > > >> > > to > > > > > >> > > >> > > > > authorize the external system. Which could be > what > > we > > > > > have > > > > > >> now - > > > > > >> > > >> user > > > > > >> > > >> > > > that > > > > > >> > > >> > > > > allows us to trigger the event. Which means that > > our > > > > REST > > > > > >> API > > > > > >> > > >> could > > > > > >> > > >> > > > > potentially be used the same way it is now, but > we > > > will > > > > > >> need > > > > > >> > > >> "something" > > > > > >> > > >> > > > > (library, lambda function etc.) that users could > > > easily > > > > > >> setup in > > > > > >> > > >> the > > > > > >> > > >> > > > > external system to map whatever trigger they > > generate > > > > > >> natively > > > > > >> > > >> (say S3 > > > > > >> > > >> > > > file > > > > > >> > > >> > > > > created) to Airflow REST API. > > > > > >> > > >> > > > > > > > > > >> > > >> > > > > As I see it - this is quite often used (and very > > > > > >> practical, that > > > > > >> > > >> you > > > > > >> > > >> > > > deploy > > > > > >> > > >> > > > > a cloud function or lambda that subscribes on the > > > event > > > > > >> received > > > > > >> > > >> when > > > > > >> > > >> > > > > S3/GCS is created. So it would be on the user to > > > deploy > > > > > >> such a > > > > > >> > > >> lambda - > > > > > >> > > >> > > > but > > > > > >> > > >> > > > > we **could** provide a library of those: say s3 > > > lambda, > > > > > gcp > > > > > >> > > cloud > > > > > >> > > >> > > > function > > > > > >> > > >> > > > > in respective providers - with documentation how > to > > > set > > > > > >> them up, > > > > > >> > > >> and how > > > > > >> > > >> > > > to > > > > > >> > > >> > > > > configure authorization and we would be generally > > > > "done". > > > > > >> I am > > > > > >> > > >> just not > > > > > >> > > >> > > > > sure if we need a new entity in Airflow for that > > > (Event > > > > > >> > > >> receiver). It > > > > > >> > > >> > > > feels > > > > > >> > > >> > > > > like it asks Airflow to take more responsibility, > > > when > > > > we > > > > > >> all > > > > > >> > > >> think on > > > > > >> > > >> > > > what > > > > > >> > > >> > > > > to "remove" from Airflow rather than "add" to it > - > > > > > >> especially > > > > > >> > > >> when it > > > > > >> > > >> > > > comes > > > > > >> > > >> > > > > to external integrations. It feels to me that > > Airflow > > > > > >> should > > > > > >> > > make > > > > > >> > > >> it easy > > > > > >> > > >> > > > > to be triggered by such an external system and > make > > > it > > > > > >> easy to > > > > > >> > > >> "map" to > > > > > >> > > >> > > > the > > > > > >> > > >> > > > > way we expect to get events triggered, but this > > > should > > > > be > > > > > >> done > > > > > >> > > >> outside of > > > > > >> > > >> > > > > Airflow. If the users can easily find in our docs > > > when > > > > > they > > > > > >> > > >> search "what > > > > > >> > > >> > > > do > > > > > >> > > >> > > > > I do to externally trigger Airflow on S3 change": > > > > either > > > > > a) > > > > > >> > > >> configure > > > > > >> > > >> > > > > polling in airflow using s3 Connection, or b) > > > "create a > > > > > >> user + > > > > > >> > > >> deploy > > > > > >> > > >> > > > this > > > > > >> > > >> > > > > lambda with those parameters" - that is "easy > > > enough" > > > > > and > > > > > >> very > > > > > >> > > >> practical > > > > > >> > > >> > > > > as well. > > > > > >> > > >> > > > > > > > > > >> > > >> > > > > But maybe I am not seeing the whole picture and > the > > > > real > > > > > >> problem > > > > > >> > > >> it's > > > > > >> > > >> > > > > solving - so take it as a "first review pass" and > > > "guts > > > > > >> > > feeling". > > > > > >> > > >> > > > > > > > > > >> > > >> > > > > J. > > > > > >> > > >> > > > > > > > > > >> > > >> > > > > > > > > > >> > > >> > > > > > > > > > >> > > >> > > > > > > > > > >> > > >> > > > > On Thu, Jul 25, 2024 at 10:55 PM Beck, Vincent > > > > > >> > > >> > > > <vincb...@amazon.com.invalid > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > > wrote: > > > > > >> > > >> > > > > > > > > > >> > > >> > > > > > Hello everyone, > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > I created a draft AIP regarding "External event > > > > driven > > > > > >> > > >> scheduling in > > > > > >> > > >> > > > > > Airflow". This proposal is about adding > > capability > > > in > > > > > >> Airflow > > > > > >> > > to > > > > > >> > > >> > > > schedule > > > > > >> > > >> > > > > > DAGs based on external events. Here are some > > > examples > > > > > of > > > > > >> such > > > > > >> > > >> external > > > > > >> > > >> > > > > > events: > > > > > >> > > >> > > > > > - A user signs up to one of the user pool > defined > > > in > > > > my > > > > > >> cloud > > > > > >> > > >> provider > > > > > >> > > >> > > > > > - One of the databases used in my company has > > been > > > > > >> updated > > > > > >> > > >> > > > > > - A job in my cloud provider has been executed > > > > > >> successfully > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > The intent of this AIP is to leverage datasets > > > (which > > > > > >> will be > > > > > >> > > >> soon > > > > > >> > > >> > > > > assets) > > > > > >> > > >> > > > > > and update them based on external events. I > would > > > > like > > > > > to > > > > > >> > > >> propose this > > > > > >> > > >> > > > > AIP > > > > > >> > > >> > > > > > for discussion and more importantly, hear some > > > > > feedbacks > > > > > >> from > > > > > >> > > >> you :) > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > > > > > >> > > >> > > > > > > > > >> > > >> > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FAIRFLOW%2FAIP-82%2BExternal%2Bevent%2Bdriven%2Bscheduling%2Bin%2BAirflow&data=05%7C02%7CJens.Scheffler%40de.bosch.com%7C9e55ef9af31e4a669ef108dcada3a726%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638576165598178951%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=3FFvhCI6RA6sPhZoiOBAqzgyTkC6NNYqJYjBRVqEmUY%3D&reserved=0 > > > > > >> > > >> > > > < > > > > > >> > > >> > > > > > > > > >> > > >> > > > > > >> > > > > > > > >> > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-82+External+event+driven+scheduling+in+Airflow > > > > > >> > > >> > > > > > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > Vincent > > > > > >> > > >> > > > > > > > > > > >> > > >> > > > > > > > > > >> > > >> > > > > > > > > >> > > >> > > > > > > > >> > > >> > > > > > > >> > > >> > > > > > > >> > > > --------------------------------------------------------------------- > > > > > >> > > >> > To unsubscribe, e-mail: > > dev-unsubscr...@airflow.apache.org > > > > > >> > > >> > For additional commands, e-mail: > > > dev-h...@airflow.apache.org > > > > > >> > > >> > > > > > > >> > > >> > > > > > > >> > > >> > > > > > >> > > >> > > > > > >> > > > --------------------------------------------------------------------- > > > > > >> > > >> To unsubscribe, e-mail: > dev-unsubscr...@airflow.apache.org > > > > > >> > > >> For additional commands, e-mail: > > dev-h...@airflow.apache.org > > > > > >> > > >> > > > > > >> > > >> > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > --------------------------------------------------------------------- > > > > > >> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > > > > >> For additional commands, e-mail: dev-h...@airflow.apache.org > > > > > >> > > > > > >> > > > > > > > > > > > > > > >