rusackas commented on code in PR #40237:
URL: https://github.com/apache/superset/pull/40237#discussion_r3292062115
##########
superset/models/core.py:
##########
@@ -565,10 +571,29 @@ def _get_sqla_engine( # pylint: disable=too-many-locals
# noqa: C901
security_manager,
source,
)
+ # Per-process engine cache (#27897). SQLAlchemy expects
``create_engine``
+ # to be called once per process per URL so its connection pool can do
+ # its job. Recreating the engine every call defeats the pool that
+ # operators configure via ``DB_CONNECTION_MUTATOR`` (e.g. duckdb with a
+ # size-1 queue). Skip the cache when ``nullpool`` is True — those
+ # engines are intentionally poolless and there's nothing to reuse.
+ cache_key: tuple[Any, ...] | None = None
+ if not nullpool:
Review Comment:
Spot on — confirmed: every in-tree caller uses the default `nullpool=True`,
so the cache as written was dormant exactly where it needs to engage. Fixed in
05557350ab by removing the `if not nullpool:` gate so caching happens
regardless. Even a NullPool engine has nontrivial construction cost (URL
parsing, dialect resolution, connect_args setup, re-running
DB_CONNECTION_MUTATOR), and the operator pool config #27897 is about can only
persist if the engine object itself is reused.
Also updated the regression test to call with the default `nullpool=True` so
it actually exercises the production path — if it had done that originally, you
wouldn't have had to catch this for me. Thanks for the careful read!
##########
superset/models/core.py:
##########
@@ -94,6 +94,12 @@
metadata = Model.metadata # pylint: disable=no-member
logger = logging.getLogger(__name__)
+# Per-process SQLAlchemy engine cache (#27897). Key is
+# (database_id, str(sqlalchemy_url), repr(sorted(engine_kwargs.items()))).
+# Populated only when ``nullpool=False`` — pooled engines are the only ones
+# that benefit from process-wide reuse.
+_ENGINE_CACHE: dict[tuple[Any, ...], Engine] = {}
Review Comment:
Good question. The cache key is `(database_id, str(sqlalchemy_url),
repr(sorted(engine_kwargs.items())))`:
- **Password rotation / host change / SSH tunnel reconfig**:
`sqlalchemy_url` is the *decrypted* URL (built from the latest Database fields
each call), and `engine_kwargs` includes whatever `DB_CONNECTION_MUTATOR`
produces from the latest Database state. So a rotated password or changed host
means a different key on the next call → cache miss → fresh engine. Existing
in-flight requests on the old engine continue against the old credentials until
they finish, which matches SQLAlchemy's normal behavior.
- **Deletion**: stale entries linger until process restart — that's a memory
footprint concern (a few hundred bytes per dead entry) rather than a
correctness one. I'd rather not couple this PR to a SQLAlchemy event hook in
`Database.__delete__` since the right invalidation surface is bigger than just
delete (would also want it on rename/clone), and worth scoping as a follow-up.
Updated the cache header comment to make the URL/kwargs key contract
explicit.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]