blag commented on PR #36075: URL: https://github.com/apache/airflow/pull/36075#issuecomment-1843333200
Please refer to the PR implementing this, and the discussion about an `extra` parameter [here](https://github.com/apache/airflow/pull/25419#pullrequestreview-1061066994). IIRC, the intent here was _not_ to break database normalization and copy `DatasetModel.extra` for every DatasetEvent, it was to enable third party `DatasetEventManager`s to record additional data from external sources (including non-Airflow sources) with a minimum of fuss. The `extra` fields of `DatasetModel` and `DatasetEvent` are similarly named, but used for different and distinct purposes, and I did _not_ intend for `DatasetEvent.extra` to be copied from `DatasetModel.extra`. Given that, I would not expect `DatasetEvent.extra` to be used within Airflow, as if we wanted to store more information in the `DatasetEvent` table, we would just update the database schema. However, just because it is not used internally within Airflow does not mean may not be useful. IMO #35297 is invalid, because tasks should be able to query the Airflow DB directly for the dataset extra field using the keys in `triggering_dataset_events`, and should _absolutely not_ expect the dataset extra field to be copied to all dataset event extra fields. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
