blag commented on PR #36075:
URL: https://github.com/apache/airflow/pull/36075#issuecomment-1843333200

   Please refer to the PR implementing this, and the discussion about an 
`extra` parameter 
[here](https://github.com/apache/airflow/pull/25419#pullrequestreview-1061066994).
   
   IIRC, the intent here was _not_ to break database normalization and copy 
`DatasetModel.extra` for every DatasetEvent, it was to enable third party 
`DatasetEventManager`s to record additional data from external sources 
(including non-Airflow sources) with a minimum of fuss. The `extra` fields of 
`DatasetModel` and `DatasetEvent` are similarly named, but used for different 
and distinct purposes, and I did _not_ intend for `DatasetEvent.extra` to be 
copied from `DatasetModel.extra`.
   
   Given that, I would not expect `DatasetEvent.extra` to be used within 
Airflow, as if we wanted to store more information in the `DatasetEvent` table, 
we would just update the database schema. However, just because it is not used 
internally within Airflow does not mean may not be useful.
   
   IMO #35297 is invalid, because tasks should be able to query the Airflow DB 
directly for the dataset extra field using the keys in 
`triggering_dataset_events`, and should _absolutely not_ expect the dataset 
extra field to be copied to all dataset event extra fields.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to