GitHub user andrewmusselman created a discussion: Q2 roadmap
## Already shipped
1. **Rename sandbox → runs** — frontend rename, history list, form pre-fill,
re-run button. Component files renamed; URL `/agent/:agentId/sandbox` →
`/agent/:agentId/runs` with new nested `/agent/:agentId/runs/:runId`. Backend
identifiers (`RunCodeRequest`, `sandbox_run` log strings, `sandbox_agent`
placeholder) intentionally kept. ✓ merged to main.
2. **Runs page reads from saved agent doc, not stale agentFlowContext** — the
silent-staleness bug that caused max_tokens overrides to vanish, output schema
to collapse to `{outputText: string}`, and list/json input fields to drop.
Always-refetch when agentId is in URL. ✓ merged to main.
3. **DB perf — bulk ops, optimistic writes, deferred access tracking** —
`save_many`/`delete_many`/`get_many` on the backend (CouchDB uses `_bulk_docs`
and `_all_docs?keys=`). Service-layer rewrites of `set_many` (2 round trips
regardless of N), `clear_namespace` (2 round trips), `set()` optimistic with
conflict retry. Access tracking deferred to `AccessAccumulator` background
flush every 10s. 21 new tests; `data_store_service.py` to 91% coverage. ✓
merged to main.
4. **Per-model llm_settings** — `LlmSettings` carries a `perModel` map keyed by
`<provider>/<model>` so a Sonnet call gets Sonnet's overrides instead of
`invokableModels[0]`'s. ✓ merged to main.
5. **ModelConfigDialog UX** — `max_tokens` as numeric TextField (was an
unusably-precise Slider over 1..128000); inline Alert for mutex conflicts
(temperature/top_p) with one-click resolution buttons; cleared mutex partners
stay cleared across dialog re-open. ✓ merged to main.
6. **Log redaction** — common credential shapes (PATs,
OpenAI/Anthropic/AWS/Google/Slack/Stripe/JWT, Authorization headers, BEGIN
PRIVATE KEY markers, generic api_key=/secret=/password= patterns) stripped from
log/trace events. 61 tests with synthesized fixtures (no literal tokens on disk
to satisfy GitHub's secret scanner). ✓ merged to main.
## Open backlog — to file as issues
### Issue: Multi-tenant non-blocking runtime (run registry + persistent run
state)
**Problem.** Today an agent runs inside the request thread. The streaming SSE
response holds the connection open for the run's full duration. Two
consequences:
- The browser tab is the run. Refresh / navigate / close the tab and the SSE
connection drops; the run actually keeps executing on the backend (asyncio.Task
isn't cancelled on disconnect — the `event_generator` `finally` block awaits
the task), but the user has no way to see results when they come back.
- `uvicorn --reload` kills in-flight runs. We've felt this in practice.
These are also the things that block "true multi-tenancy" — multiple users
running multiple agents at the same time and all seeing their work survive a
tab close, a disconnect, or a brief deploy. The multi-process worker model
isn't a prerequisite (single-process is fine for now); what's missing is the
run state being independent of the request that started it.
**Proposed solution.** Run registry at the user-service level. A dict keyed by
`run_id`, value is a `RunRecord`:
```python
@dataclass
class RunRecord:
run_id: str
user_id: str
agent_name: str
started_at: datetime
status: Literal["running", "success", "error", "stopped"]
trace: Trace
queue: asyncio.Queue
task: asyncio.Task # the agent's execution
cancel_token: CancelToken # for the stop button
result: Optional[dict]
error: Optional[str]
completed_at: Optional[datetime]
schema_warnings: Optional[list]
ops_log: Optional[list]
```
Two storage tiers: in-memory dict for active and recently-completed runs (last
hour); persisted to CouchDB on completion (when the in-memory entry is
evicted). Survives uvicorn restart for completed runs; in-flight runs at
restart still die (full durability would need a worker-process model —
separate, bigger refactor).
**New endpoints:**
```
POST /agents/run-code/start # NEW — kick off a run, return {run_id}
immediately
GET /runs # NEW — list current user's runs
GET /runs/{run_id} # NEW — fetch run state
GET /runs/{run_id}/stream # NEW — subscribe; if running, live SSE +
replay;
# if complete, bulk replay then close
POST /runs/{run_id}/stop # NEW — set cancel token; respond 202
DELETE /runs/{run_id} # NEW — clear from in-memory registry
POST /agents/run-code/stream # Existing — kept compatible; first frame now
# includes run_id
```
**Frontend touches.** A `useRun(run_id)` hook subscribing to
`/runs/{run_id}/stream` and exposing `{status, events, result, error,
isStreaming}`. Used by the runs page now and by the home-page module later.
**Effort.** ~400 LOC backend + ~350 LOC frontend.
**Auth.** Each run is owned by the user who started it. `GET /runs/{run_id}`
rejects with 403 if `request.user.uid != run.user_id`. Standard scope check;
this is what makes multi-tenancy safe.
**Open questions.**
- Persistence of completed runs: CouchDB-backed in the same database as
everything else? (Recommended.) Or in-memory only, accept they're gone on
restart? (Honest about today's reality.)
- Single-process assumption: scaling to multiple replicas means the in-memory
registry becomes per-replica, and a request to `/runs/{run_id}/stream` could
land on the wrong replica. Fix is sticky sessions or a shared store (Redis).
Worth knowing now even if not fixing now.
---
### Issue: Runs page subscribes by run_id (not just for the current request)
**Problem.** The runs page today fires `POST /agents/run-code/stream` and reads
the SSE stream inline. If you navigate away, the stream drops; if you come
back, you get nothing. The history list shows past runs only via in-memory
state, so it disappears on refresh.
**Proposed solution.** Refactor the runs page to:
1. Click Run → `POST /agents/run-code/start`, get `{run_id}`.
2. `useRun(run_id)` mounts and subscribes to `/runs/{run_id}/stream`.
3. On navigation away and back, `useRun(run_id)` reconnects to the same
registry entry — replay events that fired during disconnect, then transition to
live.
4. `/agent/:agentId/runs/:runId` deep-links to an existing run (live or
complete).
5. History list reads from `GET /runs?agent_id=X` so it persists across
sessions.
**Depends on:** run registry above.
**Open question.** Reconnect semantics: when a client reconnects, does it get a
full replay of events emitted before connection? Recommended yes — replay all
`RunRecord.trace.events`, then transition to live streaming. Keeps UX simple.
---
### Issue: Home page Running Jobs module
**Problem.** A user running multiple agents has no cross-agent view of "what's
running right now" or "what just finished". To check on something started
elsewhere they have to navigate to that specific agent's runs page.
**Proposed solution.** New module on HomePage above (or beside) the agents
list. Lists the user's runs across all agents — status chip, agent name,
started-at, run_id. Click → `/agent/:agentId/runs/:runId`. Auto-refresh by
polling `GET /runs` every few seconds. Live status updates by re-fetching, not
subscribing to SSE per row (overkill).
**Depends on:** run registry, runs page using run_id.
---
### Issue: Stop button
**Problem.** Long-running agents sometimes need to be interrupted. The earlier
"Cancel" button only aborted the client-side fetch; the server-side task kept
running until completion. We want the run to actually stop.
**Proposed solution.** Cooperative cancellation, layered:
- **CancelToken via contextvar** (same pattern as Trace). The agent runtime
checks `should_stop()` between operations.
- **Enforcement at structural boundaries.** Wrap `tools.call_llm`,
`tools.data_store.*`, and `gofannon_client.call` with an entry check — if
stopping, raise `AgentStopped` immediately without executing. In-flight LLM
calls finish naturally; only the next attempt to do anything observable raises.
This avoids `task.cancel()` raising `CancelledError` mid-await inside the LLM
call's HTTP request — too aggressive.
**UI.** Stop button next to Run; disabled when no run in flight. While stopping
(after click, before halt), button shows "Stopping… (after current LLM call
completes)" disabled. Run's outcome becomes a third state `stopped`, neutral
chip color in the Progress Log (gray with a stop icon, not red).
**Depends on:** run registry (for the cancel token to live somewhere
addressable by `POST /runs/{run_id}/stop`).
**Open questions.**
- Stop semantics for chained agents: when agent X is stopping and X has called
Y, does Y stop too? Recommended yes — stop means the whole tree. Contextvar
makes this trivial.
- UI feedback during "Stopping…" — could take 30+ seconds in the worst case
(waiting for LLM response). UI should be honest: "Stopping after current LLM
call completes…" rather than promising instant.
---
### Issue: Per-agent environment variables
**Problem.** Agents have tunable knobs (e.g. `GITHUB_PUSH_CONCURRENCY` in the
ASVS pipeline) that today have to be hardcoded, threaded through `inputText`,
or set at the host level (invisible coupling). None of those is right.
**Proposed solution.** Two halves:
- **Editor accordion + persistence.** New `EnvVarsAccordion.jsx` between Data
Store Configuration and Agent Code in the agent editor. Three columns: Key /
Value / Description. POSIX-style key validation. Persists as `env_vars:
List[AgentEnvVar]` on the Agent model.
- **Runtime injection via contextvar-bound environ proxy.** Mutating
`os.environ` directly under a lock would serialize all agent runs. Instead
install an `_EnvironProxy` wrapping `os.environ` that consults a contextvar for
the per-task overlay. Each run sets the contextvar before invoking the agent
function. asyncio tasks inherit contextvar context, so concurrent runs see
different overlays without locking.
**What this is NOT.** Not for secrets — values are plaintext on the agent doc
and visible in trace events. The user-profile API Keys feature handles secrets.
**Effort.** ~120 LOC backend + ~280 LOC frontend + ~80 LOC tests.
**Open questions.**
- Subclass `os._Environ` vs. monkey-patch `os.environ.__class__.__getitem__`.
Both are hacks. Subclass is more explicit; monkey-patch is shorter. Recommend
subclass.
- UI without runtime is misleading (saves values, agents don't see them).
Recommend shipping both halves at once.
---
### Issue: Composer-LLM ignores multi-key output schema
**Problem.** When a user declares an output schema with multiple keys, the
composer LLM frequently generates code that returns `{outputText: ...}` only,
ignoring the declared schema. The validator at
`dependencies.py:validate_output_against_schema` correctly flags the mismatch
as a warning ("missing required output keys" + "unexpected output key
outputText"), but the agent code itself was generated wrong, so every run
produces noise.
This is now layered on top of issue (2) above, which fixed the upstream bug
where the runs page was sending the *default* `{outputText: "string"}` schema
as if it were the user's declared schema. With (2) fixed, the user's actual
schema does flow into the composer prompt — but the LLM still ignores it.
**Proposed solution.** Iterate on the composer prompt at
`agent_factory/prompts.py`. The current prompt (around
`output_schema=output_schema_str`, line 140 area) lists the schema in the
instructions but the LLM doesn't reliably comply.
Three angles, increasing effort:
- Strengthen the prompt — try few-shot examples of correct vs incorrect
returns, more emphatic phrasing, place the schema in the user message instead
of the system prompt for higher signal.
- Validate generated code post-hoc — parse the AST, check the `return`
statement matches the schema keys; if not, regenerate or surface an error
before saving.
- Generate the return statement directly from the schema — emit a `return
{key1: ..., key2: ...}` skeleton the LLM fills in, rather than asking for a
free-form return.
Recommended starting point: prompt iteration. The validator already exists;
tightening compliance from the prompt side is the smallest change.
---
### Issue (filed): DynamoDB backend bulk APIs
Already filed alongside the DB perf commit. The default loop fallback works,
but DynamoDB users don't get the bulk-API wins until `BatchWriteItem` (25/req
chunking) and `BatchGetItem` (100/req chunking) with `UnprocessedItems` retry
are implemented in `services/database_service/dynamodb.py`. References commit
`3d88b234c06d2c977f8ca4bb043b529d21f4e9d1`.
---
### Roadmap (separate plan): headless invocation auth
Filed earlier as a follow-up, not as a single issue. The deployed-agent
endpoint `/rest/<friendly_name>` and the run-code endpoints all gate on
`get_current_user`, which only knows how to read the `gofannon_sid` cookie. CI
runners can't get that cookie without the user-driven login flow. Three
candidate directions:
- Per-agent API keys (smallest scope, most natural CI integration, ~200 LOC).
- Per-user service-account tokens (more flexible, larger blast radius if
leaked).
- GitHub OIDC (cleanest secret-handling, GHA-specific).
Each is a real feature, not a quick patch. Token security has long-tail
concerns (rotation, scoping, audit, rate limiting per key, revocation, audit
logging) that are easy to underspec. Serializes naturally on the run registry —
without async start + polling, headless agent calls of any meaningful duration
are unworkable.
Recommend filing per-agent API keys (option a) first since it's the most scoped
and unblocks the immediate ask.
---
## Suggested filing order
The first three open backlog items (run registry, runs page subscribes, home
page module) are best filed as one tracking issue with the others as sub-tasks,
since the user-visible payoff of multi-tenancy doesn't land until all three are
in. Stop button and env vars are independent and can be standalone issues.
Composer-LLM schema compliance is also independent and should be its own issue.
Headless auth is its own RFC-style document, not a single issue.
GitHub link: https://github.com/apache/tooling-gofannon/discussions/6
----
This is an automatically sent email for [email protected].
To unsubscribe, please send an email to: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]