blag commented on code in PR #26942:
URL: https://github.com/apache/airflow/pull/26942#discussion_r992640772
##########
airflow/www/views.py:
##########
@@ -3574,13 +3586,27 @@ def datasets_summary(self):
DatasetModel.id,
DatasetModel.uri,
)
- .filter(DatasetModel.uri.ilike(f"%{uri_pattern}%"))
.order_by(*order_by)
- .offset(offset)
- .limit(limit)
- .all()
- ]
- data = {"datasets": datasets, "total_entries": total_entries}
+ )
+
+ if updated_before or updated_after:
+ count_query = count_query.outerjoin(DatasetEvent,
DatasetEvent.dataset_id == DatasetModel.id)
Review Comment:
For the unfiltered query we want outer joins.
But for the filtered results query and for the filtered count query, doing
an outer join and then filtering the results later in a WHERE clause is
effectively an inner join, right?
I (possibly naively!) would expect that most query planners will properly
optimize that into an inner join, so the performance difference should be
nothing or negligible. And performance is always such a fickle thing to
preemptively plan for and measure after the fact.
Part of me thinks that we keep it as similar to the results query as
possible, and use outer joins and filter it later on.
Thoughts?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]