flyrain commented on issue #3786: URL: https://github.com/apache/polaris/issues/3786#issuecomment-4503583696
Copied the architectural direction doc from #3990 to here per [dev mailing discussion](https://lists.apache.org/thread/r3qy9sm3nmzrjh12t6hyrl04xcq3hklq): A Delegation Service (D.S.) is a service that works alongside Polaris, either driving Polaris from outside or running inside the Polaris deployment to do work on its behalf. It can be deployed in one of two modes depending on who runs it and which way the calls flow: - **Pull**: the delegation service runs outside Polaris (e.g. a scheduled compaction or snapshot-expiration job) and calls Polaris over REST to fetch policies, table metadata, etc. - **Push**: the delegation service is co-deployed with Polaris, inside the same security boundary. Polaris invokes it for heavy workloads that would otherwise degrade the Polaris service, such as intensive network calls, large I/O operations, or compute-heavy tasks. The delegation service is hidden behind the Polaris deployment; clients cannot reach it and do not need to know whether one is configured or which implementation is in use. The two modes solve different problems. Pull supports external systems that integrate with Polaris. Push lets a Polaris deployment offload internal work (e.g. table file purge on `DROP ... PURGE`, server-side scan planning) without changing the public API. A single deployment can use both. ``` Pull mode (delegation service is external): ┌──────────────────────┐ REST (pull) ┌────────────┐ │ Delegation service │ ─────────────────► │ Polaris │ │ (compute engine, │ │ │ │ maintenance job) │ └────────────┘ └──────────────────────┘ Push mode (delegation service is internal, invisible to clients): ┌── Polaris deployment ─────────────────┐ │ │ ┌────────┐ REST │ ┌──────────┐ internal ┌──────┐ │ │ Client │ ────────► │ ─►│ Polaris │ ────────────► │ D.S. │ │ └────────┘ │ └──────────┘ └──────┘ │ │ │ └───────────────────────────────────────┘ ``` ## Pull mode Pull mode is the natural fit for **table maintenance services**: data compaction, snapshot expiration, orphan file removal, manifest rewriting, and similar background jobs. These services run on their own schedule, decide which tables to act on, and need policies and metadata from Polaris to drive that work. In pull mode, the delegation service talks to Polaris **exclusively over REST APIs**: the Iceberg REST Catalog (IRC) endpoints for table operations (load, commit, list, credential vending), and the Polaris REST endpoints for catalog-specific resources such as policies and generic tables. There is no SDK, callback; every interaction is an outbound HTTP request. Authentication uses OAuth2, the standard Polaris auth path; The full REST surface is in the [API specs](#api-specs). ### Example: external compaction service ``` 1. POST /v1/oauth/tokens (auth) 2. GET /v1/{cat}/namespaces/{ns}/tables (discover tables) 3. GET /polaris/v1/{cat}/applicable-policies (pull policy) ?namespace={ns}&target-name={tbl} &policyType=system.data-compaction 4. If "enable": true, run compaction with the parameters from policy.content 5. Repeat on schedule ``` ## Push mode In push mode, the delegation service is co-deployed with the Polaris, in the same security boundary as Polaris itself. Polaris invokes it for heavy workloads that would otherwise degrade the Polaris service, such as intensive network calls, large I/O operations, or compute-heavy tasks. External clients cannot reach the delegation service directly, and they cannot tell whether or which one is deployed; Polaris remains the only public entry point. ### Properties - **Same security boundary as Polaris.** The delegation service is reachable only by Polaris, deployed alongside it (e.g., separate pod within the same trust zone). It can be granted credentials and access that would be unsafe to vend to clients. - **No public contract.** The wire protocol between Polaris and the delegation service is internal and may evolve. Clients see only the Polaris REST API. - **Pluggable, opaque to clients.** Whether a delegation service is configured, and which implementation runs (e.g. an async worker for purge, an engine-aware planner for scan planning), is a deployment-time decision. The same client request behaves identically from the client's point of view regardless of which one is in use. ### Use cases #### Drop with purge When a client issues a drop with `purge=true`, Polaris must remove the catalog entry **and** delete the table's files. Doing the file deletion in the request thread couples client latency to (potentially large) storage operations and saturates the outbound bandwidth of the Polaris server. In push mode, Polaris records the drop and hands the purge job to the delegation service, which executes it asynchronously off the Polaris request path. The client sees the table gone immediately; the file deletion completes behind the scenes. #### Server-side scan planning Server-side scan planning lets Polaris produce the file list for a query, instead of the engine reading table metadata directly. Two motivations: exposing non-Iceberg table formats through the Iceberg REST API, and sharing planning caches across queries. When enabled, Polaris invokes the delegation service to plan the scan, then returns the result over the IRC scan-planning endpoints. The client never knows whether the planning ran inside Polaris or in a delegation service. ## Choosing pull vs push | | Pull | Push | |-----------------------|-------------------------------------------------------------------------------|---------------------------------------------------------| | Where the D.S. runs | Outside Polaris, owned by the caller | Co-deployed with Polaris, in the same security boundary | | Direction of calls | D.S. → Polaris | Polaris → D.S. | | Visibility to clients | Visible (the D.S. *is* the client) | Hidden; clients only see Polaris | | Wire protocol | Polaris REST (IRC + Polaris APIs) | Internal, private to the deployment | | Typical use cases | Table maintenance jobs (compaction, snapshot expiration, orphan file cleanup) | Drop with purge, server-side scan planning | The two modes are independent: a deployment can use one, the other, or both. ## API specs For the full REST surface, refer to the OpenAPI specs: - [Iceberg REST Catalog](https://github.com/apache/polaris/blob/main/spec/iceberg-rest-catalog-open-api.yaml): IRC endpoints (table create, load, list, commit, credential vending). - [Polaris management API](https://github.com/apache/polaris/blob/main/spec/polaris-management-service.yml): catalogs, principals, roles, grants. - [Polaris policy API](https://github.com/apache/polaris/blob/main/spec/polaris-catalog-apis/policy-apis.yaml): policy CRUD and applicable-policies. - [Polaris generic-tables API](https://github.com/apache/polaris/blob/main/spec/polaris-catalog-apis/generic-tables-api.yaml): generic-table CRUD. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
