The GitHub Actions job "Tests" on airflow.git has failed.
Run started by GitHub user mpeteuil (triggered by mpeteuil).

Head commit for run:
c7c1ee626bb1c50ee6c9cbeb6480e553748c69a2 / Michael Peteuil 
<[email protected]>
Make Datasets hashable

Currently DAGs accept a
[`Collection["Dataset"]`](https://github.com/apache/airflow/blob/0c02ead4d8a527cbf0a916b6344f255c520e637f/airflow/models/dag.py#L171)
as an option for the `schedule`, but that collection cannot be a `set`
because Datasets are not a hashable type. The interesting thing is that
[the `DatasetModel` is actually already
hashable](https://github.com/apache/airflow/blob/dec78ab3f140f35e507de825327652ec24d03522/airflow/models/dataset.py#L93-L100),
so this introduces a bit of duplication since it's the same
implementation. However, Airflow users are primarily interfacing with
`Dataset`, not `DatasetModel` so I think it makes sense for `Dataset` to
be hashable. I'm not sure how to square the duplication or what `__eq__`
and `__hash__` provide for `DatasetModel` though.

There was discussion [on the original PR that created the
`DatasetModel`](https://github.com/apache/airflow/pull/24613) about
whether to create two classes or one. In that discussion @kaxil
mentioned:

> I would slightly favour a separate `DatasetModel` and `Dataset` so
`Dataset` becomes an extensible class, and `DatasetModel` just stores
the info about the class. So users don't need to care about SQLAlchmey
stuff when extending it.

That provides a bit of background on why they both exist for anyone who
is curious.

Report URL: https://github.com/apache/airflow/actions/runs/7923363313

With regards,
GitHub Actions via GitBox


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to