andrewmusselman opened a new issue, #21:
URL: https://github.com/apache/tooling-gofannon/issues/21

   ### Summary
   Today every agent run executes inside the request thread, with the SSE 
response holding the connection open for the run's full duration. Closing the 
browser tab orphans the run from the user's view (the asyncio.Task keeps 
executing but they have no way to reconnect). Restarting the api container 
kills every in-flight run. This blocks true multi-tenancy and basic operational 
visibility.
    
   ### Details
   **Problem:**
   - The browser tab *is* the run. Refresh / navigate / close drops the SSE; 
the run keeps executing on the backend but the user can't see results when they 
return.
   - `uvicorn --reload` kills in-flight runs in dev. Felt repeatedly.
   - No way to list "what's running right now" — operators cannot answer 
"should I kill my first run before I start another?" without tailing logs.
   - All downstream features (stop button, home-page running-jobs module, runs 
page subscribe-by-id, headless invocation) require a run identity that exists 
independently of the request that started it.
   ### Proposed solution
   Run registry at the user-service level — a dict keyed by `run_id`, value is 
a `RunRecord` dataclass:
    
   ```python
   @dataclass
   class RunRecord:
       run_id: str
       user_id: str
       agent_name: str
       started_at: datetime
       status: Literal["running", "success", "error", "stopped"]
       trace: Trace
       queue: asyncio.Queue
       task: asyncio.Task
       cancel_token: CancelToken
       result: Optional[dict]
       error: Optional[str]
       completed_at: Optional[datetime]
       schema_warnings: Optional[list]
       ops_log: Optional[list]
   ```
    
   Two storage tiers: in-memory dict for active and recently-completed runs 
(last hour); persisted to CouchDB on completion when the in-memory entry is 
evicted.
    
   New endpoints:
   ```
   POST   /agents/run-code/start     # Returns {run_id} immediately, run 
continues in background
   GET    /runs                      # List current user's runs
   GET    /runs/{run_id}             # Fetch run state
   GET    /runs/{run_id}/stream      # SSE subscribe (live + replay)
   POST   /runs/{run_id}/stop        # Set cancel token; respond 202
   DELETE /runs/{run_id}             # Clear in-memory entry
   POST   /agents/run-code/stream    # Existing — kept compatible; first frame 
includes run_id
   ```
    
   Each run is owned by the user who started it. `GET /runs/{run_id}` returns 
403 if `request.user.uid != run.user_id` (and once ISSUE-002 lands, also widens 
to `run.workspace_id IN session.workspace_ids`).
    
   ### Acceptance Criteria
   - [ ] Fixed: `RunRecord` dataclass and in-memory registry implemented
   - [ ] Fixed: `POST /agents/run-code/start` returns `run_id` and dispatches 
asyncio.Task
   - [ ] Fixed: `GET /runs` lists runs for current user
   - [ ] Fixed: `GET /runs/{run_id}/stream` subscribes via SSE with 
replay-then-live
   - [ ] Fixed: completed runs persisted to CouchDB after eviction
   - [ ] Fixed: existing `POST /agents/run-code/stream` endpoint emits `run_id` 
in first frame (backward-compatible)
   - [ ] Test added: closing the SSE connection does not kill the underlying run
   - [ ] Test added: reconnecting to a running run replays past events before 
going live
   - [ ] Test added: user A cannot read user B's run
   - [ ] Test added: completed runs are queryable for at least 1 hour
   - [ ] Documentation: API reference for new endpoints
   ### References
   - File: `webapp/packages/api/user-service/routes.py:659` (existing stream 
endpoint)
   - File: `webapp/packages/api/user-service/dependencies.py:148-200` 
(_execute_agent_code)
   - Tracker: FIXES.md item #2
   ### Priority
   **High** - Foundation for ISSUE-005 (runs page subscribes), ISSUE-006 (home 
page Running Jobs), ISSUE-007 (stop button), and ISSUE-015 (headless invocation 
auth). Single-process assumption acceptable for now; scaling to replicas 
requires sticky sessions or shared store (future work).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to