bobbai00 opened a new issue, #5561:
URL: https://github.com/apache/texera/issues/5561
### Feature Summary
The `agent-service` currently authenticates its own requests, and the check
is both weak and incomplete:
- The only token handling is the **optional** `userToken` in `POST
/api/agents`. `agent-service/src/api/auth-api.ts` base64-decodes the JWT
payload and checks **only the `exp` claim** — the **signature is never
verified** (`validateToken = !isTokenExpired`), so a forged-but-unexpired token
passes.
- Every other endpoint — `GET /api/agents`, `GET /api/agents/models`, `PATCH
…/settings`, and the `/api/agents/:id/react` WebSocket that drives LLM calls
(and provider spend) — performs **no auth at all**.
- At the gateway, the ext-authz `SecurityPolicy`
(`bin/k8s/templates/gateway-security-policy.yaml`) targets only the
`*-dynamic-routes` HTTPRoute (wsapi / executions-stats / pve). The
`*-agent-service-route` is **not** targeted, so `/api/agents` bypasses
ext-authz entirely. (Single-node nginx has no ext-authz at all.)
Net effect: anyone who can reach the gateway can drive the agent-service
(and spend the LLM budget), and the only "JWT validation" is a
non-cryptographic decode performed by the agent-service itself.
Authentication and authorization for the agent-service should instead be
**owned by the access-control-service**, the same way computing-unit /
execution / pve requests are already authorized via Envoy ext-authz. The
access-control-service should:
1. **Authenticate** — verify the JWT signature (it already does this via
`JwtParser.parseToken`), and
2. **Authorize** — decide whether the user may access the **specific agent**
named in the request.
We do not yet persist an agent→owner mapping, but managing that access check
is the access-control-service's responsibility, consistent with how it already
owns dataset / computing-unit access.
_(Surfaced while reviewing #5558, which moved the model list + LLM path onto
the agent-service; relates to the role gate added in #5421.)_
### Proposed Solution or Design
**Phase 1 — Authentication (ext-authz in front of the agent-service)**
- Add an Envoy `SecurityPolicy` targeting the `*-agent-service-route`
HTTPRoute, mirroring `gateway-security-policy.yaml`, pointing ext-auth at the
access-control-service `/api/auth` and injecting `x-user-id` / `x-user-name` /
`x-user-email`.
- Add a branch to `AccessControlResource.authorize()` for `/api/agents…`
that verifies the JWT (`parseToken`) and requires `REGULAR`/`ADMIN`
(`SessionUser.isRoleOf`) — i.e. restore the gate the old LiteLLM proxy had —
returning the trusted identity headers on success and 401/403 otherwise. Token
extraction already supports header `Authorization: Bearer`, query
`access-token` (for the WS), and body `token`.
- Frontend: attach the JWT to **all** agent-service calls. Today
`agentHeaders()` only sets `X-Agent-Workflow-Id`, `fetchModelTypes()` sends
nothing, and the `/react` WebSocket carries no token. REST → `Authorization:
Bearer`; WS → `?access-token=…` (mirror the workflow WebSocket).
- Remove the agent-service's home-grown `decodeJWT` / `validateToken`; have
it **trust the gateway-injected `x-user-*` headers**, which only exist after
ext-authz passes.
- Single-node (nginx, no Envoy): add an `auth_request` subrequest to
`/api/auth`, or explicitly accept that dev single-node stays open.
**Phase 2 — Authorization (per-agent access)**
- Introduce an agent **ownership / ACL** model: associate each agent with
its creating user (uid). Agents are currently in-memory only (`agentStore`),
lost on restart and not tied to a user — so this needs a durable mapping.
- The access-control-service answers "may uid X act on agent Y?". Options:
- **(a)** access-control reads an agent→owner mapping from a shared store
(DB) and decides — keeps the authz decision fully inside access-control
(matches the stated goal). The agent id is available from the request path
(`/api/agents/{id}/…`), which ext-auth already receives, exactly like `cuid`
today.
- **(b)** Hybrid: access-control injects the trusted uid and the
agent-service enforces ownership locally. Simpler, but the authz logic lives
outside access-control.
- This also sets up future agent sharing/granting (like dataset / workflow
ACLs).
**Open design questions**
- Where/how to persist agent ownership (a dedicated table vs. persisting
agents themselves; agents are ephemeral today).
- Does Phase 2 require persisting agents, or just an ownership index keyed
by agent id?
- Keep the existing `userToken` body field — it is the user's token
forwarded to the dashboard-service for delegated workflow CRUD, which is
separate from authenticating the caller.
### Affected Area
- Deployment / Infrastructure (Envoy `SecurityPolicy`, nginx)
- Workflow UI (JWT propagation for REST + WebSocket)
- access-control-service and agent-service
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]