scr-oath opened a new issue, #33020:
URL: https://github.com/apache/airflow/issues/33020

   ### Description
   
   Provide a mechanism to pass data (XCOM?) so that downstream DAGs could know 
more context about how/why/when/by-what they were triggerered.
   
   ### Use case/motivation
   
   In order to avoid writing monolithic DAGs, it would seem useful to have 
separate DAGs focused on discrete Input and Output transforms, which would also 
allow them to be retried/rescheduled as needed. One could imagine daily 
batch-processing comprised of several DAGs and think of using the dataset 
mechanism as a way to trigger efficiently.  However, it seems that no 
information comes along with a dataset passed in each DAG's "schedule". If 
several days of daily tasks are (re-)scheduled, the outlet of a dataset would 
not be able to communicate to downstream DAGs what the "datestamp" was for them 
to process.
   
   As of now the dataset is just a string and, when loosely coupling a 
producer/consumer via the Dataset, there is no way to communicate specific 
information about the producer's exact output. There also doesn't appear to be 
a way to mix-n-match scheduling based on a dataset as well as `@daily` e.g. so 
there's no way to connect a particular day's producer DAG with a consumer DAG.
   
   If a task could query its lineage and specifically get data / XCOM 
information from the DAG/task/Dataset that triggered it, then it could take 
efficient actions based on the previous task's specific output location (i.e. 
its datestamp directory if that's the convention, but could be anything, really 
if a general way of passing/receiving data were provided.)
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to