wjddn279 commented on code in PR #55589:
URL: https://github.com/apache/airflow/pull/55589#discussion_r2349128886


##########
airflow-core/src/airflow/models/serialized_dag.py:
##########
@@ -460,7 +460,9 @@ def write_dag(
 
     @classmethod
     def latest_item_select_object(cls, dag_id):
-        return select(cls).where(cls.dag_id == 
dag_id).order_by(cls.created_at.desc()).limit(1)
+        # prevent "Out of sort memory" caused by large values in cls.data 
column
+        latest_item_id = select(cls.id).where(cls.dag_id == 
dag_id).order_by(cls.created_at.desc()).limit(1)
+        return select(cls).where(cls.id == latest_item_id)

Review Comment:
   @ashb Thank you for the review.
   
   As you said, this query is optimized for MySQL and not for Postgres or 
SQLite. The change resolves the issue for MySQL, and I expect no significant 
impact on performance or functionality for Postgres and SQLite. (If your 
comment was about performance specifically, please let me know.)
   
   Regarding readability, I considered two approaches:
   - Branching based on the engine type
   - Keeping a single query
   
   I opted for the second approach because branching would introduce patterns 
not currently used in the project’s query generation and could compromise the 
intended ORM usage.
   
   While this results in a query that is somewhat unconventional depending on 
the engine, we believe it is understandable with the accompanying comment. Of 
course, opinions may vary, and if this is a concern, adding an index on 
created_at could be considered as an alternative.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to