bobbai00 opened a new pull request, #5276:
URL: https://github.com/apache/texera/pull/5276

   ### What changes were proposed in this PR?
   
   Each agent was a stateful object in a process-local `Map`, never persisted, 
so a restart / crash / redeploy lost every agent and its conversation. This 
adds **opt-in disk persistence** for agents:
   
   - **`TexeraAgent.toSnapshot()` / `restoreFromSnapshot()`** — capture and 
restore the durable state: the ReAct step tree, HEAD, settings, delegate 
metadata, and the workflow being edited. The short-lived user token and the 
recomputable execution-result cache are intentionally excluded. `createdAt` is 
preserved across restarts.
   - **`AgentSnapshotStore`** (`src/persistence/`) — one JSON file per agent, 
atomic writes (temp file + rename), debounced coalescing of rapid updates, and 
`loadAll()` that skips unreadable/unsupported files.
   - **Server wiring** — persists on create / clear / checkout / settings 
change and after each WebSocket turn; removes the file on delete; and 
**rehydrates all agents on startup** via `rehydrateAgents`. Enabled by 
`AGENT_STATE_DIR`; leaving it empty keeps the previous in-memory-only behavior.
   
   ```
   Before:  restart -> agentStore empty            -> all agents + history lost
   After:   restart -> loadAll() + rehydrate       -> agents restored from disk
   ```
   
   ### Any related issues, documentation, discussions?
   
   Closes #5267
   
   ### How was this PR tested?
   
   ```
   cd agent-service
   bun test            # 110 pass, 0 fail
   bun run typecheck   # clean
   bun run format:check
   ```
   New tests:
   - `agent/texera-agent-snapshot.test.ts` — `toSnapshot` of a fresh agent, 
`createdAt` round-trip, `restoreFromSnapshot` restoring 
conversation/settings/workflow/delegate, a full JSON round-trip deep-equal, and 
version rejection.
   - `persistence/agent-snapshot-store.test.ts` — save/load, directory 
auto-create, overwrite, `loadAll` (incl. skipping 
corrupt/unsupported/non-matching files), remove (incl. missing), and debounced 
`scheduleSave` + `flush` coalescing.
   - `server.test.ts` — a created agent is written to disk, delete removes the 
file, and `rehydrateAgents` restores a persisted agent so it is served again.
   
   ### Was this PR authored or co-authored using generative AI tooling?
   
   Generated-by: Claude Opus 4.8 (1M context)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to