Vamsi-klu commented on PR #63591:
URL: https://github.com/apache/airflow/pull/63591#issuecomment-4061358945
Hey @Pranaykarvi, took a look at this — the lock exhaustion problem is real
and I've seen similar issues on large Postgres deployments, so this is a
welcome fix. A few things I noticed while reading through the diff:
**The upgrade path has the same problem.**
`migrate_existing_deadline_alert_data_from_serialized_dag()` still uses
`conn.begin_nested()` with the same per-DAG savepoint pattern. It actually does
*more* work per DAG (INSERT + two UPDATEs + reads), so it'd hit
`max_locks_per_transaction` even sooner than the downgrade. Might be worth
fixing both paths in one go.
**Validation checks could move outside the `with engine.begin()` block.**
Right now the checks for non-list data, non-string values, and missing deadline
data are inside the transaction context. When they hit `continue`, Python exits
the `with` block normally — which opens a connection, begins a transaction, and
immediately commits it without doing any work. For most DAGs (which won't have
deadline data), that's a lot of empty round-trips. Moving those checks before
`with engine.begin()` would avoid that.
**Minor: the error message lost some context.** The original had `f"Could
not restore deadline: {e}"` in `dags_with_errors`, but the new code just
appends `str(e)`. The `log.error` call still has context, but `report_errors()`
downstream will now get bare exception messages without the "Could not restore
deadline" prefix.
**One thing to double-check: SQLite.** `engine.begin()` opens a separate
connection from the pool, while the Alembic `conn` still holds its own
connection. On SQLite (which does database-level locking), the second
connection could hit "database is locked" if the first connection's transaction
hasn't been finalized. The original savepoint approach doesn't have this issue
on SQLite since it stays on one connection. Might be worth keeping savepoints
for SQLite specifically, since it doesn't have the lock exhaustion problem
anyway.
Overall the direction looks right to me — just wanted to flag these before
it goes further.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]