Vamsi-klu commented on PR #63591:
URL: https://github.com/apache/airflow/pull/63591#issuecomment-4061358945

   Hey @Pranaykarvi, took a look at this — the lock exhaustion problem is real 
and I've seen similar issues on large Postgres deployments, so this is a 
welcome fix. A few things I noticed while reading through the diff:
   
   **The upgrade path has the same problem.** 
`migrate_existing_deadline_alert_data_from_serialized_dag()` still uses 
`conn.begin_nested()` with the same per-DAG savepoint pattern. It actually does 
*more* work per DAG (INSERT + two UPDATEs + reads), so it'd hit 
`max_locks_per_transaction` even sooner than the downgrade. Might be worth 
fixing both paths in one go.
   
   **Validation checks could move outside the `with engine.begin()` block.** 
Right now the checks for non-list data, non-string values, and missing deadline 
data are inside the transaction context. When they hit `continue`, Python exits 
the `with` block normally — which opens a connection, begins a transaction, and 
immediately commits it without doing any work. For most DAGs (which won't have 
deadline data), that's a lot of empty round-trips. Moving those checks before 
`with engine.begin()` would avoid that.
   
   **Minor: the error message lost some context.** The original had `f"Could 
not restore deadline: {e}"` in `dags_with_errors`, but the new code just 
appends `str(e)`. The `log.error` call still has context, but `report_errors()` 
downstream will now get bare exception messages without the "Could not restore 
deadline" prefix.
   
   **One thing to double-check: SQLite.** `engine.begin()` opens a separate 
connection from the pool, while the Alembic `conn` still holds its own 
connection. On SQLite (which does database-level locking), the second 
connection could hit "database is locked" if the first connection's transaction 
hasn't been finalized. The original savepoint approach doesn't have this issue 
on SQLite since it stays on one connection. Might be worth keeping savepoints 
for SQLite specifically, since it doesn't have the lock exhaustion problem 
anyway.
   
   Overall the direction looks right to me — just wanted to flag these before 
it goes further.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to