asf-tooling commented on issue #299:
URL:
https://github.com/apache/tooling-trusted-releases/issues/299#issuecomment-4403370378
<!-- gofannon-issue-triage-bot v2 -->
**Automated triage** — analyzed at `main@837830e8`
**Type:** `refactor` • **Classification:** `actionable` •
**Confidence:** `medium`
**Application domain(s):** `data_storage`, `infrastructure`
### Summary
The issue requests preventing reuse of detached SQLAlchemy session objects
when passed into database commit contexts. This is a recurring bug pattern
(referenced as cause of #297) where model objects fetched in one session are
used in another session's commit context, causing DetachedInstanceError or
silent data corruption. The fix should be at the type level, making it harder
to accidentally pass stale/detached objects across session boundaries.
### Where this lives in the code today
#### `atr/db/__init__.py` — `ensure_session` (lines 833-839)
_needs modification_
This is the primary entry point where a caller-provided session (potentially
containing detached objects from another context) gets reused. The pattern of
passing `caller_data: db.Session | None` allows detached objects to be accessed
in wrong contexts.
```python
@contextlib.asynccontextmanager
async def ensure_session(caller_data: db.Session | None = None) ->
AsyncGenerator[Session]:
if caller_data is not None:
yield caller_data
else:
async with session() as data:
yield data
```
#### `atr/db/__init__.py` — `session` (lines 857-861)
_currently does this_
Creates a new session context. Objects fetched within this context become
detached once it exits, but can still be passed to other functions that operate
on different sessions.
```python
@contextlib.asynccontextmanager
async def session(log_queries: bool = False) -> AsyncGenerator[Session]:
async with sessionmaker()() as data:
yield data
```
#### `atr/db/interaction.py` — `all_releases` (lines 88-111)
_currently does this_
Example of the problematic pattern: objects fetched in one session scope are
modified outside it (release.project = project), and the `project` parameter
itself could be a detached object from another session.
```python
async def all_releases(project: sql.Project) -> list[sql.Release]:
"""Get all releases for the project, sorted by version."""
query = sqlmodel.select(sql.Release).where(sql.Release.project_key ==
project.key)
results = []
async with db.session() as data:
for result in (await data.execute(query)).all():
release = result[0]
results.append(release)
for release in results:
release.project = project
try:
# This rejects any non PEP 440 versions
results.sort(key=lambda r: version.Version(r.version), reverse=True)
except Exception as e:
```
#### `atr/db/interaction.py` — `releases_by_phase` (lines 330-350)
_currently does this_
Another instance of the pattern where a detached `project` object is
assigned to newly-fetched releases after the session closes, which can cause
issues if those releases are later used in a commit context.
```python
async def releases_by_phase(project: sql.Project, phase: sql.ReleasePhase)
-> list[sql.Release]:
"""Get the releases for the project by phase."""
query = (
sqlmodel.select(sql.Release)
.where(
sql.Release.project_key == project.key,
sql.Release.phase == phase,
)
.order_by(sql.validate_instrumented_attribute(sql.Release.created).desc())
)
results = []
async with db.session() as data:
for result in (await data.execute(query)).all():
release = result[0]
results.append(release)
for release in results:
# Don't need to eager load and lose it when the session closes
release.project = project
return results
```
#### `atr/storage/__init__.py` — `Write` (lines 217-303)
_needs modification_
The Write class holds a db.Session reference and passes it to WriteAs*
classes. If model objects from a different session are added/merged in this
session's commit context, DetachedInstanceError occurs.
```python
class Write:
# Read and Write have authenticator methods which return access outcomes
# TODO: Still need to send some runtime credentials guarantee to the
WriteAs* classes
def __init__(self, authorisation: principal.Authorisation, data:
db.Session):
self.__authorisation: Final[principal.Authorisation] = authorisation
self.__data: Final[db.Session] = data
```
#### `atr/storage/__init__.py` — `WriteAsGeneralPublic` (lines 119-133)
_needs modification_
Each writer receives the raw db.Session. Writers can inadvertently add
detached objects from outside this session to the commit context.
```python
class WriteAsGeneralPublic(WriteAs):
def __init__(self, write: Write, data: db.Session):
self.announce = writers.announce.GeneralPublic(write, self, data)
self.cache = writers.cache.GeneralPublic(write, self, data)
self.checks = writers.checks.GeneralPublic(write, self, data)
self.keys = writers.keys.GeneralPublic(write, self, data)
self.mail = writers.mail.GeneralPublic(write, self, data)
self.policy = writers.policy.GeneralPublic(write, self, data)
self.project = writers.project.GeneralPublic(write, self, data)
self.release = writers.release.GeneralPublic(write, self, data)
self.revision = writers.revision.GeneralPublic(write, self, data)
self.sbom = writers.sbom.GeneralPublic(write, self, data)
self.ssh = writers.ssh.GeneralPublic(write, self, data)
self.tokens = writers.tokens.GeneralPublic(write, self, data)
self.vote = writers.vote.GeneralPublic(write, self, data)
```
#### `atr/db/__init__.py` — `Session` (lines 146-160)
_needs modification_
The Session class is where type-level guards could be added. For example,
overriding `add()` and `merge()` to check object session identity, or
introducing a branded/tagged session type that prevents mixing objects from
different sessions.
```python
class Session(sqlalchemy.ext.asyncio.AsyncSession):
def __init__(self, *args: Any, **kwargs: Any) -> None:
explicit_value_passed_by_sessionmaker = kwargs.pop("log_queries",
None)
super().__init__(*args, **kwargs)
self.log_queries: bool = global_log_query
if explicit_value_passed_by_sessionmaker is not None:
self.log_queries = explicit_value_passed_by_sessionmaker
```
### Where new code would go
- `atr/db/__init__.py` — after symbol Session
Add a type-safe wrapper or runtime guard in the Session class that detects
when a model object from a different session is being added/merged/committed.
### Proposed approach
The approach should be two-fold: (1) Add runtime detection in
`Session.add()` and `Session.merge()` to raise a clear error when a detached
object from another session is being used, and (2) Introduce type-level
distinction so that functions receiving model objects mark them as
potentially-detached (e.g., a `Detached[T]` wrapper or protocol) making it a
type error to pass them directly to session operations without explicit
re-attachment.
For the immediate fix, overriding `Session.add()` to check
`sqlalchemy.inspect(instance).detached` would catch the most common case at
runtime with a clear error message. For the type-level approach, a `NewType` or
generic wrapper could distinguish between session-bound and detached model
instances, though this requires more extensive refactoring of function
signatures throughout `atr/db/interaction.py` and the storage writers.
### Suggested patches
#### `atr/db/__init__.py`
Add runtime guards to Session.add() and Session.merge() that detect detached
objects and raise a clear error, preventing the class of bugs described in #297.
````diff
--- a/atr/db/__init__.py
+++ b/atr/db/__init__.py
@@ -146,6 +146,38 @@ class Session(sqlalchemy.ext.asyncio.AsyncSession):
def __init__(self, *args: Any, **kwargs: Any) -> None:
explicit_value_passed_by_sessionmaker = kwargs.pop("log_queries",
None)
super().__init__(*args, **kwargs)
self.log_queries: bool = global_log_query
if explicit_value_passed_by_sessionmaker is not None:
self.log_queries = explicit_value_passed_by_sessionmaker
+ def _check_not_detached(self, instance: Any, operation: str) -> None:
+ """Raise if instance is a detached SQLModel object from another
session.
+
+ This prevents the reuse of objects fetched in a different session
+ context, which causes DetachedInstanceError on commit or silent
+ data corruption.
+ """
+ if not isinstance(instance, sqlmodel.SQLModel):
+ return
+ try:
+ state = sqlalchemy.inspect(instance)
+ except sqlalchemy.exc.NoInspectionAvailable:
+ return
+ if state.detached:
+ raise RuntimeError(
+ f"Cannot {operation} a detached instance of
{type(instance).__name__} "
+ f"(identity={state.identity}). The object was likely loaded
in a "
+ f"different session. Use session.merge() with an explicit
refresh, "
+ f"or re-query the object in this session."
+ )
+
+ def add(self, instance: Any, _warn: bool = True) -> None: # type:
ignore[override]
+ self._check_not_detached(instance, "add")
+ super().add(instance, _warn=_warn)
+
+ def add_all(self, instances: Any) -> None: # type: ignore[override]
+ for instance in instances:
+ self._check_not_detached(instance, "add_all")
+ super().add_all(instances)
+
# TODO: Need to type all of these arguments correctly
async def begin_immediate(self) -> None:
````
### Open questions
- Issue #297 is referenced but not available — understanding the exact
detached-object scenario that triggered it would help refine the solution.
- Should `Session.merge()` also be guarded, or is merge the intended escape
hatch for re-attaching detached objects?
- A full type-level solution (e.g., `Detached[T]` wrapper type) would
require significant refactoring of `atr/db/interaction.py` functions that
return model objects outside session scope — is that level of investment
desired?
- Some functions in interaction.py deliberately assign detached objects
(e.g., `release.project = project` in `releases_by_phase`). These would need to
be refactored to either pass the session through or use only scalar attributes
from detached objects.
- Should the guard also check for objects that are `transient` (never
persisted) vs `detached` (were persisted in another session)?
### Files examined
- `atr/db/__init__.py`
- `atr/storage/__init__.py`
- `atr/storage/outcome.py`
- `atr/db/interaction.py`
- `atr/models/sql.py`
- `atr/storage/readers/__init__.py`
- `atr/storage/writers/__init__.py`
- `atr/principal.py`
---
*Draft from a triage agent. A human reviewer should validate before merging
any change. The agent did not run tests or verify diffs apply.*
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]