bobbai00 opened a new issue, #5561:
URL: https://github.com/apache/texera/issues/5561

   ### Feature Summary
   
   The `agent-service` currently authenticates its own requests, and the check 
is both weak and incomplete:
   
   - The only token handling is the **optional** `userToken` in `POST 
/api/agents`. `agent-service/src/api/auth-api.ts` base64-decodes the JWT 
payload and checks **only the `exp` claim** — the **signature is never 
verified** (`validateToken = !isTokenExpired`), so a forged-but-unexpired token 
passes.
   - Every other endpoint — `GET /api/agents`, `GET /api/agents/models`, `PATCH 
…/settings`, and the `/api/agents/:id/react` WebSocket that drives LLM calls 
(and provider spend) — performs **no auth at all**.
   - At the gateway, the ext-authz `SecurityPolicy` 
(`bin/k8s/templates/gateway-security-policy.yaml`) targets only the 
`*-dynamic-routes` HTTPRoute (wsapi / executions-stats / pve). The 
`*-agent-service-route` is **not** targeted, so `/api/agents` bypasses 
ext-authz entirely. (Single-node nginx has no ext-authz at all.)
   
   Net effect: anyone who can reach the gateway can drive the agent-service 
(and spend the LLM budget), and the only "JWT validation" is a 
non-cryptographic decode performed by the agent-service itself.
   
   Authentication and authorization for the agent-service should instead be 
**owned by the access-control-service**, the same way computing-unit / 
execution / pve requests are already authorized via Envoy ext-authz. The 
access-control-service should:
   
   1. **Authenticate** — verify the JWT signature (it already does this via 
`JwtParser.parseToken`), and
   2. **Authorize** — decide whether the user may access the **specific agent** 
named in the request.
   
   We do not yet persist an agent→owner mapping, but managing that access check 
is the access-control-service's responsibility, consistent with how it already 
owns dataset / computing-unit access.
   
   _(Surfaced while reviewing #5558, which moved the model list + LLM path onto 
the agent-service; relates to the role gate added in #5421.)_
   
   ### Proposed Solution or Design
   
   **Phase 1 — Authentication (ext-authz in front of the agent-service)**
   - Add an Envoy `SecurityPolicy` targeting the `*-agent-service-route` 
HTTPRoute, mirroring `gateway-security-policy.yaml`, pointing ext-auth at the 
access-control-service `/api/auth` and injecting `x-user-id` / `x-user-name` / 
`x-user-email`.
   - Add a branch to `AccessControlResource.authorize()` for `/api/agents…` 
that verifies the JWT (`parseToken`) and requires `REGULAR`/`ADMIN` 
(`SessionUser.isRoleOf`) — i.e. restore the gate the old LiteLLM proxy had — 
returning the trusted identity headers on success and 401/403 otherwise. Token 
extraction already supports header `Authorization: Bearer`, query 
`access-token` (for the WS), and body `token`.
   - Frontend: attach the JWT to **all** agent-service calls. Today 
`agentHeaders()` only sets `X-Agent-Workflow-Id`, `fetchModelTypes()` sends 
nothing, and the `/react` WebSocket carries no token. REST → `Authorization: 
Bearer`; WS → `?access-token=…` (mirror the workflow WebSocket).
   - Remove the agent-service's home-grown `decodeJWT` / `validateToken`; have 
it **trust the gateway-injected `x-user-*` headers**, which only exist after 
ext-authz passes.
   - Single-node (nginx, no Envoy): add an `auth_request` subrequest to 
`/api/auth`, or explicitly accept that dev single-node stays open.
   
   **Phase 2 — Authorization (per-agent access)**
   - Introduce an agent **ownership / ACL** model: associate each agent with 
its creating user (uid). Agents are currently in-memory only (`agentStore`), 
lost on restart and not tied to a user — so this needs a durable mapping.
   - The access-control-service answers "may uid X act on agent Y?". Options:
     - **(a)** access-control reads an agent→owner mapping from a shared store 
(DB) and decides — keeps the authz decision fully inside access-control 
(matches the stated goal). The agent id is available from the request path 
(`/api/agents/{id}/…`), which ext-auth already receives, exactly like `cuid` 
today.
     - **(b)** Hybrid: access-control injects the trusted uid and the 
agent-service enforces ownership locally. Simpler, but the authz logic lives 
outside access-control.
   - This also sets up future agent sharing/granting (like dataset / workflow 
ACLs).
   
   **Open design questions**
   - Where/how to persist agent ownership (a dedicated table vs. persisting 
agents themselves; agents are ephemeral today).
   - Does Phase 2 require persisting agents, or just an ownership index keyed 
by agent id?
   - Keep the existing `userToken` body field — it is the user's token 
forwarded to the dashboard-service for delegated workflow CRUD, which is 
separate from authenticating the caller.
   
   ### Affected Area
   - Deployment / Infrastructure (Envoy `SecurityPolicy`, nginx)
   - Workflow UI (JWT propagation for REST + WebSocket)
   - access-control-service and agent-service
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to