aicam opened a new issue, #5630: URL: https://github.com/apache/texera/issues/5630
### Feature Summary Since the cluster networking was unified under a single **Envoy Gateway** (#4191) and the **access-control-service** was added as the external-authorization (ext-auth) service for computing-unit traffic (#3598), the access-control-service is the single component that decides which upstream a user's computing-unit request is routed to. On every authorized request it returns a `Host` header that the gateway uses as the routing target, and Envoy forwards the upgraded connection there. Today the service does **not** route to a URI it was given — it **reconstructs** the target from `KubernetesConfig` (pool name, namespace, port) using the in-cluster Kubernetes DNS convention: ``` computing-unit-<cuid>.<pool>-svc.<namespace>.svc.cluster.local:<port> ``` This hard-wires every computing unit to the local cluster under a fixed naming convention, and duplicates the address-construction logic that already lives in `KubernetesClient.generatePodURI`. It makes it impossible to distribute computing units to arbitrary locations — a CU running on a remote node, in a different cluster, or on a host **outside the cluster**. To enable CU distribution, the access-control-service should accept and route to **any** URI recorded for a computing unit, instead of assuming the in-cluster address. ### Proposed Solution or Design **Why this is possible with Envoy Gateway.** Envoy Gateway's ext-auth `SecurityPolicy` lets an external service authorize each request and contribute headers; in Texera's setup the access-control-service also supplies the upstream `Host`. Envoy's **dynamic forward proxy** — a `Backend` of type `DynamicResolver` — then resolves and forwards to an arbitrary `host:port` (FQDN or IP) determined at request time from that header. In other words, the routing target is whatever the access-control-service returns; it does **not** have to be an in-cluster `*.svc.cluster.local` address. (Refs: Envoy Gateway [External Authorization](https://gateway.envoyproxy.io/docs/tasks/security/ext-auth/) and [Backend Routing / Dynamic Resolver](https://gateway.envoyproxy.io/docs/tasks/traffic/backend/).) **Proposed change.** 1. In `AccessControlResource`, resolve the routing target from the **URI persisted for the computing unit** (the `workflow_computing_unit` row, written by the managing service via `KubernetesClient.generatePodURI`) — a single source of truth — instead of reconstructing it from `KubernetesConfig`. 2. Keep the previously constructed in-cluster address as a **fallback** for units that do not yet have a recorded URI, so existing behavior is preserved. 3. Make the recorded URI complete by **including the port** in `generatePodURI` (the pod's container listens on `computeUnitPortNumber`, but the stored URI omitted it), so the value the access-control-service routes to is directly connectable. **Operational notes / prerequisites** (from the Envoy Gateway docs above): the `DynamicResolver` Backend is disabled by default and must be explicitly enabled with appropriate RBAC; loopback hosts (`localhost`, `127.0.0.1`, `::1`) are denied by default; and routing to out-of-cluster targets additionally requires the corresponding network egress to be allowed. ### Affected Area - Deployment / Infrastructure --- Addressed by #5629. By submitting this issue, you agree to follow the [Apache Code of Conduct](https://www.apache.org/foundation/policies/conduct). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
