aicam opened a new issue, #5630:
URL: https://github.com/apache/texera/issues/5630

   ### Feature Summary
   
   Since the cluster networking was unified under a single **Envoy Gateway** 
(#4191) and the **access-control-service** was added as the 
external-authorization (ext-auth) service for computing-unit traffic (#3598), 
the access-control-service is the single component that decides which upstream 
a user's computing-unit request is routed to. On every authorized request it 
returns a `Host` header that the gateway uses as the routing target, and Envoy 
forwards the upgraded connection there.
   
   Today the service does **not** route to a URI it was given — it 
**reconstructs** the target from `KubernetesConfig` (pool name, namespace, 
port) using the in-cluster Kubernetes DNS convention:
   
   ```
   computing-unit-<cuid>.<pool>-svc.<namespace>.svc.cluster.local:<port>
   ```
   
   This hard-wires every computing unit to the local cluster under a fixed 
naming convention, and duplicates the address-construction logic that already 
lives in `KubernetesClient.generatePodURI`. It makes it impossible to 
distribute computing units to arbitrary locations — a CU running on a remote 
node, in a different cluster, or on a host **outside the cluster**.
   
   To enable CU distribution, the access-control-service should accept and 
route to **any** URI recorded for a computing unit, instead of assuming the 
in-cluster address.
   
   ### Proposed Solution or Design
   
   **Why this is possible with Envoy Gateway.** Envoy Gateway's ext-auth 
`SecurityPolicy` lets an external service authorize each request and contribute 
headers; in Texera's setup the access-control-service also supplies the 
upstream `Host`. Envoy's **dynamic forward proxy** — a `Backend` of type 
`DynamicResolver` — then resolves and forwards to an arbitrary `host:port` 
(FQDN or IP) determined at request time from that header. In other words, the 
routing target is whatever the access-control-service returns; it does **not** 
have to be an in-cluster `*.svc.cluster.local` address. (Refs: Envoy Gateway 
[External 
Authorization](https://gateway.envoyproxy.io/docs/tasks/security/ext-auth/) and 
[Backend Routing / Dynamic 
Resolver](https://gateway.envoyproxy.io/docs/tasks/traffic/backend/).)
   
   **Proposed change.**
   
   1. In `AccessControlResource`, resolve the routing target from the **URI 
persisted for the computing unit** (the `workflow_computing_unit` row, written 
by the managing service via `KubernetesClient.generatePodURI`) — a single 
source of truth — instead of reconstructing it from `KubernetesConfig`.
   2. Keep the previously constructed in-cluster address as a **fallback** for 
units that do not yet have a recorded URI, so existing behavior is preserved.
   3. Make the recorded URI complete by **including the port** in 
`generatePodURI` (the pod's container listens on `computeUnitPortNumber`, but 
the stored URI omitted it), so the value the access-control-service routes to 
is directly connectable.
   
   **Operational notes / prerequisites** (from the Envoy Gateway docs above): 
the `DynamicResolver` Backend is disabled by default and must be explicitly 
enabled with appropriate RBAC; loopback hosts (`localhost`, `127.0.0.1`, `::1`) 
are denied by default; and routing to out-of-cluster targets additionally 
requires the corresponding network egress to be allowed.
   
   ### Affected Area
   
   - Deployment / Infrastructure
   
   ---
   
   Addressed by #5629.
   
   By submitting this issue, you agree to follow the [Apache Code of 
Conduct](https://www.apache.org/foundation/policies/conduct).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to