jscheffl commented on issue #37810:
URL: https://github.com/apache/airflow/issues/37810#issuecomment-1977331326

   Hi @uranusjr I was thinking of the same/similar feature like many many weeks 
- especially in data driven use cases. We also have a DAG that potentially 
generates dataset events - but in our use case we need to add a context - 
imagine like the file name in the S3 dataset or a UUID. And as this context is 
needed and is dynamic, would be a waste to crrate 1000's of datasets for 1000's 
events on specific files in the S3.
   
   I like the idea to attach `extra` to the events - but was thinking a bit 
more pragmatic towards what we already have. Did you consider:
   - Use the resulting XCom (==return value of the task, assuming 
`xcom_push=True`) being used as the context information? Then we don't need 
another mechanism as `DatasetEventProxy` object to pass this information.
   - Maybe still use the `extra` field store the task output/Xcom there for 
traceability/persistence and use this as DAG trigger `params` for the DAG being 
triggered from the Dataset event - then we also do not need a new mechanism to 
receive event data.
   
   Pro would be that existing `params` JSON schema validation and default could 
be used, you can call a dataset triggered DAG also manually for testing.
   Con would be that a task might raise an event with non-compliant data/not 
matching JSON schema. Where to report the error if the resulting call structure 
does not validate positive against the triggered DAG run `params` JSON schema. 
Report the DAG run failed or the emitting task being failed? But this might be 
a very small side effect to discuss.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to