stijndehaes commented on code in PR #40806:
URL: https://github.com/apache/airflow/pull/40806#discussion_r1680931606
##########
airflow/jobs/scheduler_job_runner.py:
##########
@@ -1901,6 +1901,7 @@ def _orphan_unreferenced_datasets(self, session: Session
= NEW_SESSION) -> None:
isouter=True,
)
.group_by(DatasetModel.id)
+ .where(~DatasetModel.is_orphaned)
Review Comment:
Hi @shahar1,
Thansk for the feedback!
I have thought about creating/updating a test but I believe it does not make
sense.
There is currently this test here:
https://github.com/apache/airflow/blob/63662044583031fc27d98af02f2913d324245db0/tests/jobs/test_scheduler_job.py#L5639
The logic used to be as follows if we have:
- dataset 1 (already orphaned)
- dataset 2 (not orphaned, but has no more references)
- dataset 3 (not orphaned, has references)
The result before my change would be:
- dataset 1 (orphaned, but set to orphaned again)
- dataset 2 (orphaned)
- dataset 3 (not orphaned)
The result after my change would be:
- dataset 1 (orphaned, but not updated)
- dataset 2 (orphaned)
- dataset 3 (not orphaned)
So in the end the result is the same, the only think I can maybe check is if
the updatedAt field of dataset 1 has changed. If you think that is usefull I
can try to write a test around this.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]