jscheffl commented on issue #37810: URL: https://github.com/apache/airflow/issues/37810#issuecomment-1977331326
Hi @uranusjr I was thinking of the same/similar feature like many many weeks - especially in data driven use cases. We also have a DAG that potentially generates dataset events - but in our use case we need to add a context - imagine like the file name in the S3 dataset or a UUID. And as this context is needed and is dynamic, would be a waste to crrate 1000's of datasets for 1000's events on specific files in the S3. I like the idea to attach `extra` to the events - but was thinking a bit more pragmatic towards what we already have. Did you consider: - Use the resulting XCom (==return value of the task, assuming `xcom_push=True`) being used as the context information? Then we don't need another mechanism as `DatasetEventProxy` object to pass this information. - Maybe still use the `extra` field store the task output/Xcom there for traceability/persistence and use this as DAG trigger `params` for the DAG being triggered from the Dataset event - then we also do not need a new mechanism to receive event data. Pro would be that existing `params` JSON schema validation and default could be used, you can call a dataset triggered DAG also manually for testing. Con would be that a task might raise an event with non-compliant data/not matching JSON schema. Where to report the error if the resulting call structure does not validate positive against the triggered DAG run `params` JSON schema. Report the DAG run failed or the emitting task being failed? But this might be a very small side effect to discuss. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
