wjddn279 commented on code in PR #55589:
URL: https://github.com/apache/airflow/pull/55589#discussion_r2349128886
##########
airflow-core/src/airflow/models/serialized_dag.py:
##########
@@ -460,7 +460,9 @@ def write_dag(
@classmethod
def latest_item_select_object(cls, dag_id):
- return select(cls).where(cls.dag_id ==
dag_id).order_by(cls.created_at.desc()).limit(1)
+ # prevent "Out of sort memory" caused by large values in cls.data
column
+ latest_item_id = select(cls.id).where(cls.dag_id ==
dag_id).order_by(cls.created_at.desc()).limit(1)
+ return select(cls).where(cls.id == latest_item_id)
Review Comment:
@ashb Thank you for the review.
As you said, this query is optimized for MySQL and not for Postgres or
SQLite. The change resolves the issue for MySQL, and I expect no significant
impact on performance or functionality for Postgres and SQLite. (If your
comment was about performance specifically, please let me know.)
Regarding readability, I considered two approaches:
- Branching based on the engine type
- Keeping a single query
I opted for the second approach because branching would introduce patterns
not currently used in the project’s query generation and could compromise the
intended ORM usage.
While this results in a query that is somewhat unconventional depending on
the engine, we believe it is understandable with the accompanying comment. Of
course, opinions may vary, and if this is a concern, adding an index on
created_at could be considered as an alternative.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]