dstandish opened a new pull request, #24908:
URL: https://github.com/apache/airflow/pull/24908

   exploration adding dataset event [log] table which stores all events, even 
those which are ignored for purpose of dag run queue records (e.g. say the 
dataset has already been updated once and the dag run is still waiting on other 
datasets)
   
   opens up some possibilities for more elaborate triggering behavior. also, 
provides a way to scrutinize / surface / visualize all historical dataset 
events (the queue table's records are ephemeral)
   
   even before we have a chance to make enhancements beyond 2.4, i suspect it 
would be pretty easy for a user to write a long-running deferrable operator 
that would consume dataset events and do things with them, e.g. implement more 
complex dag run triggering rules based on all the full event history.
   
   one component i'm just starting to play with and think about is how to get 
information, to the consuming dag, about what actually happened in the dataset 
event -- e.g. was it appended to, or replaced, this kind of thing. you can see 
in the change to example_datasets.py i'm using post_execute to insert some 
metadata in a dataset event payload.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to