Re: [PR] test(database): regression test for sqla engine creation (#27897) [superset]

via GitHub Fri, 22 May 2026 21:16:11 -0700


rusackas commented on code in PR #40237:
URL: https://github.com/apache/superset/pull/40237#discussion_r3292062115



##########
superset/models/core.py:
##########
@@ -565,10 +571,29 @@ def _get_sqla_engine(  # pylint: disable=too-many-locals  
# noqa: C901
                 security_manager,
                 source,
             )
+        # Per-process engine cache (#27897). SQLAlchemy expects 
``create_engine``
+        # to be called once per process per URL so its connection pool can do
+        # its job. Recreating the engine every call defeats the pool that
+        # operators configure via ``DB_CONNECTION_MUTATOR`` (e.g. duckdb with a
+        # size-1 queue). Skip the cache when ``nullpool`` is True — those
+        # engines are intentionally poolless and there's nothing to reuse.
+        cache_key: tuple[Any, ...] | None = None
+        if not nullpool:

Review Comment:
   Spot on — confirmed: every in-tree caller uses the default `nullpool=True`, 
so the cache as written was dormant exactly where it needs to engage. Fixed in 
05557350ab by removing the `if not nullpool:` gate so caching happens 
regardless. Even a NullPool engine has nontrivial construction cost (URL 
parsing, dialect resolution, connect_args setup, re-running 
DB_CONNECTION_MUTATOR), and the operator pool config #27897 is about can only 
persist if the engine object itself is reused.
   
   Also updated the regression test to call with the default `nullpool=True` so 
it actually exercises the production path — if it had done that originally, you 
wouldn't have had to catch this for me. Thanks for the careful read!



##########
superset/models/core.py:
##########
@@ -94,6 +94,12 @@
 metadata = Model.metadata  # pylint: disable=no-member
 logger = logging.getLogger(__name__)
 
+# Per-process SQLAlchemy engine cache (#27897). Key is
+# (database_id, str(sqlalchemy_url), repr(sorted(engine_kwargs.items()))).
+# Populated only when ``nullpool=False`` — pooled engines are the only ones
+# that benefit from process-wide reuse.
+_ENGINE_CACHE: dict[tuple[Any, ...], Engine] = {}

Review Comment:
   Good question. The cache key is `(database_id, str(sqlalchemy_url), 
repr(sorted(engine_kwargs.items())))`:
   
   - **Password rotation / host change / SSH tunnel reconfig**: 
`sqlalchemy_url` is the *decrypted* URL (built from the latest Database fields 
each call), and `engine_kwargs` includes whatever `DB_CONNECTION_MUTATOR` 
produces from the latest Database state. So a rotated password or changed host 
means a different key on the next call → cache miss → fresh engine. Existing 
in-flight requests on the old engine continue against the old credentials until 
they finish, which matches SQLAlchemy's normal behavior.
   - **Deletion**: stale entries linger until process restart — that's a memory 
footprint concern (a few hundred bytes per dead entry) rather than a 
correctness one. I'd rather not couple this PR to a SQLAlchemy event hook in 
`Database.__delete__` since the right invalidation surface is bigger than just 
delete (would also want it on rename/clone), and worth scoping as a follow-up.
   
   Updated the cache header comment to make the URL/kwargs key contract 
explicit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] test(database): regression test for sqla engine creation (#27897) [superset]

Reply via email to