The GitHub Actions job "Tests" on airflow.git has failed. Run started by GitHub user mpeteuil (triggered by mpeteuil).
Head commit for run: c7c1ee626bb1c50ee6c9cbeb6480e553748c69a2 / Michael Peteuil <[email protected]> Make Datasets hashable Currently DAGs accept a [`Collection["Dataset"]`](https://github.com/apache/airflow/blob/0c02ead4d8a527cbf0a916b6344f255c520e637f/airflow/models/dag.py#L171) as an option for the `schedule`, but that collection cannot be a `set` because Datasets are not a hashable type. The interesting thing is that [the `DatasetModel` is actually already hashable](https://github.com/apache/airflow/blob/dec78ab3f140f35e507de825327652ec24d03522/airflow/models/dataset.py#L93-L100), so this introduces a bit of duplication since it's the same implementation. However, Airflow users are primarily interfacing with `Dataset`, not `DatasetModel` so I think it makes sense for `Dataset` to be hashable. I'm not sure how to square the duplication or what `__eq__` and `__hash__` provide for `DatasetModel` though. There was discussion [on the original PR that created the `DatasetModel`](https://github.com/apache/airflow/pull/24613) about whether to create two classes or one. In that discussion @kaxil mentioned: > I would slightly favour a separate `DatasetModel` and `Dataset` so `Dataset` becomes an extensible class, and `DatasetModel` just stores the info about the class. So users don't need to care about SQLAlchmey stuff when extending it. That provides a bit of background on why they both exist for anyone who is curious. Report URL: https://github.com/apache/airflow/actions/runs/7923363313 With regards, GitHub Actions via GitBox --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
