Bishesh-Shahi opened a new pull request, #66932: URL: https://github.com/apache/airflow/pull/66932
Closes #66853. ## Problem Under high concurrency (80+ simultaneous task completions emitting asset events), the API server dies with OOMKill. The root cause is a DB lock contention chain: 1. i_update_state() acquires SELECT task_instance ... WITH FOR UPDATE, holding a PostgreSQL row lock. 2. While holding that lock, egister_asset_changes_in_db() runs multiple slow queries including sset_alias_model.asset_events.append(asset_event). This ORM .append() lazy-loads the **entire** sset_events collection for the alias. 3. Each slow query leaves the connection idle in transaction while Python processes results. New workers needing SELECT task_instance FOR UPDATE on the same row queue up, each holding a FastAPI threadpool thread. 4. With 80+ concurrent completions, thread count grows unbounded until OOMKill. ## Fix Two changes: **1. \AssetManager.register_asset_change()\ (\ssets/manager.py\)**: Replace \sset_alias_model.asset_events.append(asset_event)\ + \session.add(asset_alias_model)\ with a direct \INSERT INTO asset_alias_asset_event (alias_id, event_id)\. This eliminates the lazy-load of the existing events collection (which can be thousands of rows) while the task_instance row lock is held. **2. \ i_update_state()\ (\execution_api/routes/task_instances.py\)**: Add \session.commit()\ after the TI state UPDATE and Log writes to release the \ ask_instance\ row lock before running asset registration. Asset registration then runs in a fresh implicit transaction. Registration failures are logged and swallowed -- the task state is already durable at that point. ## Testing - New: \ est_register_asset_change_with_alias_no_lazy_load\ -- confirms no SELECT on \sset_alias_asset_event\ collection during registration when pre-existing rows exist - New: \ est_ti_update_state_to_success_asset_registration_failure_returns_204\ -- confirms 204 + TI SUCCESS when asset registration raises after commit -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
