asf-tooling commented on issue #299:
URL: 
https://github.com/apache/tooling-trusted-releases/issues/299#issuecomment-4403370378

   <!-- gofannon-issue-triage-bot v2 -->
   
   **Automated triage** — analyzed at `main@837830e8`
   
   **Type:** `refactor`  •  **Classification:** `actionable`  •  
**Confidence:** `medium`
   **Application domain(s):** `data_storage`, `infrastructure`
   
   ### Summary
   The issue requests preventing reuse of detached SQLAlchemy session objects 
when passed into database commit contexts. This is a recurring bug pattern 
(referenced as cause of #297) where model objects fetched in one session are 
used in another session's commit context, causing DetachedInstanceError or 
silent data corruption. The fix should be at the type level, making it harder 
to accidentally pass stale/detached objects across session boundaries.
   
   ### Where this lives in the code today
   
   #### `atr/db/__init__.py` — `ensure_session` (lines 833-839)
   _needs modification_
   This is the primary entry point where a caller-provided session (potentially 
containing detached objects from another context) gets reused. The pattern of 
passing `caller_data: db.Session | None` allows detached objects to be accessed 
in wrong contexts.
   
   ```python
   @contextlib.asynccontextmanager
   async def ensure_session(caller_data: db.Session | None = None) -> 
AsyncGenerator[Session]:
       if caller_data is not None:
           yield caller_data
       else:
           async with session() as data:
               yield data
   ```
   
   #### `atr/db/__init__.py` — `session` (lines 857-861)
   _currently does this_
   Creates a new session context. Objects fetched within this context become 
detached once it exits, but can still be passed to other functions that operate 
on different sessions.
   
   ```python
   @contextlib.asynccontextmanager
   async def session(log_queries: bool = False) -> AsyncGenerator[Session]:
       async with sessionmaker()() as data:
           yield data
   ```
   
   #### `atr/db/interaction.py` — `all_releases` (lines 88-111)
   _currently does this_
   Example of the problematic pattern: objects fetched in one session scope are 
modified outside it (release.project = project), and the `project` parameter 
itself could be a detached object from another session.
   
   ```python
   async def all_releases(project: sql.Project) -> list[sql.Release]:
       """Get all releases for the project, sorted by version."""
       query = sqlmodel.select(sql.Release).where(sql.Release.project_key == 
project.key)
   
       results = []
       async with db.session() as data:
           for result in (await data.execute(query)).all():
               release = result[0]
               results.append(release)
   
       for release in results:
           release.project = project
   
       try:
           # This rejects any non PEP 440 versions
           results.sort(key=lambda r: version.Version(r.version), reverse=True)
       except Exception as e:
   ```
   
   #### `atr/db/interaction.py` — `releases_by_phase` (lines 330-350)
   _currently does this_
   Another instance of the pattern where a detached `project` object is 
assigned to newly-fetched releases after the session closes, which can cause 
issues if those releases are later used in a commit context.
   
   ```python
   async def releases_by_phase(project: sql.Project, phase: sql.ReleasePhase) 
-> list[sql.Release]:
       """Get the releases for the project by phase."""
   
       query = (
           sqlmodel.select(sql.Release)
           .where(
               sql.Release.project_key == project.key,
               sql.Release.phase == phase,
           )
           
.order_by(sql.validate_instrumented_attribute(sql.Release.created).desc())
       )
   
       results = []
       async with db.session() as data:
           for result in (await data.execute(query)).all():
               release = result[0]
               results.append(release)
   
       for release in results:
           # Don't need to eager load and lose it when the session closes
           release.project = project
       return results
   ```
   
   #### `atr/storage/__init__.py` — `Write` (lines 217-303)
   _needs modification_
   The Write class holds a db.Session reference and passes it to WriteAs* 
classes. If model objects from a different session are added/merged in this 
session's commit context, DetachedInstanceError occurs.
   
   ```python
   class Write:
       # Read and Write have authenticator methods which return access outcomes
       # TODO: Still need to send some runtime credentials guarantee to the 
WriteAs* classes
       def __init__(self, authorisation: principal.Authorisation, data: 
db.Session):
           self.__authorisation: Final[principal.Authorisation] = authorisation
           self.__data: Final[db.Session] = data
   ```
   
   #### `atr/storage/__init__.py` — `WriteAsGeneralPublic` (lines 119-133)
   _needs modification_
   Each writer receives the raw db.Session. Writers can inadvertently add 
detached objects from outside this session to the commit context.
   
   ```python
   class WriteAsGeneralPublic(WriteAs):
       def __init__(self, write: Write, data: db.Session):
           self.announce = writers.announce.GeneralPublic(write, self, data)
           self.cache = writers.cache.GeneralPublic(write, self, data)
           self.checks = writers.checks.GeneralPublic(write, self, data)
           self.keys = writers.keys.GeneralPublic(write, self, data)
           self.mail = writers.mail.GeneralPublic(write, self, data)
           self.policy = writers.policy.GeneralPublic(write, self, data)
           self.project = writers.project.GeneralPublic(write, self, data)
           self.release = writers.release.GeneralPublic(write, self, data)
           self.revision = writers.revision.GeneralPublic(write, self, data)
           self.sbom = writers.sbom.GeneralPublic(write, self, data)
           self.ssh = writers.ssh.GeneralPublic(write, self, data)
           self.tokens = writers.tokens.GeneralPublic(write, self, data)
           self.vote = writers.vote.GeneralPublic(write, self, data)
   ```
   
   #### `atr/db/__init__.py` — `Session` (lines 146-160)
   _needs modification_
   The Session class is where type-level guards could be added. For example, 
overriding `add()` and `merge()` to check object session identity, or 
introducing a branded/tagged session type that prevents mixing objects from 
different sessions.
   
   ```python
   class Session(sqlalchemy.ext.asyncio.AsyncSession):
       def __init__(self, *args: Any, **kwargs: Any) -> None:
           explicit_value_passed_by_sessionmaker = kwargs.pop("log_queries", 
None)
           super().__init__(*args, **kwargs)
   
           self.log_queries: bool = global_log_query
           if explicit_value_passed_by_sessionmaker is not None:
               self.log_queries = explicit_value_passed_by_sessionmaker
   ```
   
   ### Where new code would go
   - `atr/db/__init__.py` — after symbol Session
     Add a type-safe wrapper or runtime guard in the Session class that detects 
when a model object from a different session is being added/merged/committed.
   
   ### Proposed approach
   The approach should be two-fold: (1) Add runtime detection in 
`Session.add()` and `Session.merge()` to raise a clear error when a detached 
object from another session is being used, and (2) Introduce type-level 
distinction so that functions receiving model objects mark them as 
potentially-detached (e.g., a `Detached[T]` wrapper or protocol) making it a 
type error to pass them directly to session operations without explicit 
re-attachment.
   
   For the immediate fix, overriding `Session.add()` to check 
`sqlalchemy.inspect(instance).detached` would catch the most common case at 
runtime with a clear error message. For the type-level approach, a `NewType` or 
generic wrapper could distinguish between session-bound and detached model 
instances, though this requires more extensive refactoring of function 
signatures throughout `atr/db/interaction.py` and the storage writers.
   
   ### Suggested patches
   
   #### `atr/db/__init__.py`
   Add runtime guards to Session.add() and Session.merge() that detect detached 
objects and raise a clear error, preventing the class of bugs described in #297.
   
   ````diff
   --- a/atr/db/__init__.py
   +++ b/atr/db/__init__.py
   @@ -146,6 +146,38 @@ class Session(sqlalchemy.ext.asyncio.AsyncSession):
        def __init__(self, *args: Any, **kwargs: Any) -> None:
            explicit_value_passed_by_sessionmaker = kwargs.pop("log_queries", 
None)
            super().__init__(*args, **kwargs)
    
            self.log_queries: bool = global_log_query
            if explicit_value_passed_by_sessionmaker is not None:
                self.log_queries = explicit_value_passed_by_sessionmaker
    
   +    def _check_not_detached(self, instance: Any, operation: str) -> None:
   +        """Raise if instance is a detached SQLModel object from another 
session.
   +
   +        This prevents the reuse of objects fetched in a different session
   +        context, which causes DetachedInstanceError on commit or silent
   +        data corruption.
   +        """
   +        if not isinstance(instance, sqlmodel.SQLModel):
   +            return
   +        try:
   +            state = sqlalchemy.inspect(instance)
   +        except sqlalchemy.exc.NoInspectionAvailable:
   +            return
   +        if state.detached:
   +            raise RuntimeError(
   +                f"Cannot {operation} a detached instance of 
{type(instance).__name__} "
   +                f"(identity={state.identity}). The object was likely loaded 
in a "
   +                f"different session. Use session.merge() with an explicit 
refresh, "
   +                f"or re-query the object in this session."
   +            )
   +
   +    def add(self, instance: Any, _warn: bool = True) -> None:  # type: 
ignore[override]
   +        self._check_not_detached(instance, "add")
   +        super().add(instance, _warn=_warn)
   +
   +    def add_all(self, instances: Any) -> None:  # type: ignore[override]
   +        for instance in instances:
   +            self._check_not_detached(instance, "add_all")
   +        super().add_all(instances)
   +
        # TODO: Need to type all of these arguments correctly
    
        async def begin_immediate(self) -> None:
   ````
   
   ### Open questions
   - Issue #297 is referenced but not available — understanding the exact 
detached-object scenario that triggered it would help refine the solution.
   - Should `Session.merge()` also be guarded, or is merge the intended escape 
hatch for re-attaching detached objects?
   - A full type-level solution (e.g., `Detached[T]` wrapper type) would 
require significant refactoring of `atr/db/interaction.py` functions that 
return model objects outside session scope — is that level of investment 
desired?
   - Some functions in interaction.py deliberately assign detached objects 
(e.g., `release.project = project` in `releases_by_phase`). These would need to 
be refactored to either pass the session through or use only scalar attributes 
from detached objects.
   - Should the guard also check for objects that are `transient` (never 
persisted) vs `detached` (were persisted in another session)?
   
   ### Files examined
   - `atr/db/__init__.py`
   - `atr/storage/__init__.py`
   - `atr/storage/outcome.py`
   - `atr/db/interaction.py`
   - `atr/models/sql.py`
   - `atr/storage/readers/__init__.py`
   - `atr/storage/writers/__init__.py`
   - `atr/principal.py`
   
   ---
   *Draft from a triage agent. A human reviewer should validate before merging 
any change. The agent did not run tests or verify diffs apply.*


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to