flyrain commented on code in PR #3990: URL: https://github.com/apache/polaris/pull/3990#discussion_r2991633060
########## site/content/in-dev/unreleased/delegation-service.md: ########## @@ -0,0 +1,733 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Delegation Service +type: docs +weight: 430 +--- + +The Polaris Delegation Service enables external services — compute engines, orchestrators, policy enforcement points, and +maintenance tools — to interact with Polaris as a centralized governance and metadata authority. + +External services can integrate with Polaris using three complementary patterns: + +- **Pull**: The external service queries Polaris on demand to fetch policies, configuration, or metadata. +- **Push**: The external service sends notifications to Polaris to inform it of changes happening externally. +- **Event**: Polaris emits internal events at every operation boundary, enabling custom listeners to bridge Polaris with external systems reactively. + +## Architecture Overview + +``` + ┌──────────────────────┐ ┌─────────────────────┐ + │ External Service │ │ External Catalog │ + │ (Spark, Trino, │ │ (Hive, Glue, │ + │ Flink, custom) │ │ custom REST) │ + └──────┬───────────────┘ └──────┬──────────────┘ + │ │ + │ PULL: query policies, │ PUSH: send table + │ metadata, configuration │ change notifications + │ │ + ▼ ▼ + ┌──────────────────────────────────────────────────────┐ + │ Apache Polaris │ + │ │ + │ ┌──────────┐ ┌─────────────┐ ┌────────────────┐ │ + │ │ Policy │ │ Notification│ │ Event Listener │ │ + │ │ API │ │ API │ │ Framework │ │ + │ └──────────┘ └─────────────┘ └───────┬────────┘ │ + │ │ │ + └─────────────────────────────────────────┼───────────┘ + │ + EVENT: BEFORE/AFTER + hooks on all operations + │ + ▼ + ┌──────────────────┐ + │ Custom Listeners │ + │ (Kafka, webhooks, │ + │ CloudWatch, ...) │ + └──────────────────┘ +``` + +## Pull vs Push — Choosing the Right Integration Pattern + +The pull and push patterns serve different purposes and have different trade-offs. Understanding when to use +each — or combine them — is key to building a robust integration with Polaris. + +### Pull: Service-Driven Queries + +In the pull model, the external service decides **when** and **what** to query. This gives the service full +control over timing and request volume. + +**Best suited for:** +- Compute engines that need policies or metadata at query planning time (e.g., Spark, Trino, Flink) +- Maintenance services that run on a fixed schedule (e.g., a compaction job that runs every hour) +- Services that need to discover resources dynamically (listing namespaces and tables) +- Scenarios where the external service is the source of truth for when work should happen + +**Advantages:** +- Simple to implement — standard REST API calls +- No infrastructure needed beyond an HTTP client +- Service controls its own polling frequency and backoff strategy +- Naturally resilient — a missed poll just delays the next check, nothing is lost + +**Considerations:** +- Introduces polling latency — changes are not detected until the next poll cycle +- Generates load on Polaris proportional to the number of services and their polling frequency +- Use the `version` field on policies to avoid redundant reprocessing (see [Polling and Versioning](#polling-and-versioning)) + +### Push: External-Catalog-Driven Notifications + +In the push model, the external system sends notifications to Polaris the moment a change occurs. Polaris +does not poll the external system — it reacts to incoming notifications. + +**Best suited for:** +- External catalogs (Hive Metastore, AWS Glue, custom REST catalogs) that manage table lifecycle outside Polaris +- Scenarios where Polaris must mirror metadata from another system with minimal delay +- Multi-catalog environments where Polaris is the unified governance layer but does not own all table metadata + +**Advantages:** +- Near-real-time synchronization — Polaris learns about changes as soon as they happen +- No polling overhead on either side +- Timestamp-based ordering ensures consistency even under concurrent updates +- `VALIDATE` pre-checks prevent wasted work when permissions or locations are misconfigured + +**Considerations:** +- Requires the external system to implement notification sending logic (e.g., a Hive hook or a CDC pipeline) +- The caller must manage monotonically increasing timestamps and handle `409 Conflict` rejections +- Only applies to table metadata synchronization — not for querying policies or configuration + +### Combining Pull and Push with Event Listeners + +For the most responsive integrations, combine pull and push with the event listener pattern: + +``` + ┌──────────────┐ 1. PUSH notification ┌────────────┐ + │ External │ ─────────────────────────→ │ Polaris │ + │ Catalog │ │ │ + └──────────────┘ │ 2. Fires │ + │ AFTER_ │ + │ SEND_ │ + │ NOTIF. │ + └─────┬──────┘ + │ + 3. Event listener │ + forwards to Kafka │ + ▼ + ┌──────────────┐ 4. Consumes event ┌────────────┐ + │ Maintenance │ ←──────────────────────────│ Kafka │ + │ Service │ └────────────┘ + │ │ 5. PULL: fetch policies + │ │ ─────────────────────────→ Polaris + └──────────────┘ +``` + +In this pattern: +1. An external catalog **pushes** metadata changes to Polaris via notifications. +2. Polaris fires `AFTER_SEND_NOTIFICATION` events, which a custom listener forwards to a message queue. +3. A maintenance service consumes the event and **pulls** the applicable policies to decide what action to take. + +This eliminates polling latency while keeping the maintenance service decoupled from the notification flow. + +## Access Delegation Modes + +When external services interact with Polaris to access table data, Polaris supports two credential +delegation mechanisms via the `X-Iceberg-Access-Delegation` HTTP header, as defined by the Iceberg +REST API specification: + +| Mode | Header Value | Description | +|------|-------------|-------------| +| **Vended Credentials** | `vended-credentials` | Polaris generates short-lived, scoped credentials (e.g., AWS STS tokens) and sends them to the client. The client uses these credentials directly to access storage. | +| **Remote Signing** | `remote-signing` | The client sends data access requests back to Polaris, which signs them on behalf of the client. The client never receives raw storage credentials. | + +The client specifies the desired mode(s) as a comma-separated list in the `X-Iceberg-Access-Delegation` +header. For backward compatibility, the legacy value `true` is treated as `vended-credentials`. + +## Pull Pattern — Querying Polaris from an External Service + +In the pull model, an external service authenticates with Polaris and calls its REST APIs on demand to retrieve +policies, table metadata, or configuration. This is the most common integration pattern for compute engines and +maintenance services. + +### Authentication + +External services authenticate using OAuth2 tokens: + +```bash +curl -X POST https://polaris.example.com/v1/oauth/tokens \ + -d "grant_type=client_credentials" \ + -d "client_id=<service-client-id>" \ + -d "client_secret=<service-client-secret>" \ + -d "scope=PRINCIPAL_ROLE:ALL" +``` + +Polaris also supports delegating authentication to an external Identity Provider (IdP) via OIDC. See +[External IdP](managing-security/external-idp/) for details. + +### Fetching Applicable Policies + +The primary pull endpoint for external services is the **applicable policies** API. It resolves the full policy +hierarchy (including inheritance from catalog and namespace levels) for a given resource: + +``` +GET /polaris/v1/{catalog}/applicable-policies?namespace={ns}&target-name={table}&policyType={type} +``` + +Parameters: +- `namespace` — the target namespace (use `%1F` as separator for multi-level namespaces) +- `target-name` — the table or view name (omit for namespace-level or catalog-level queries) +- `policyType` — optional filter (e.g., `system.data-compaction`) +- `page-token`, `page-size` — for paginating large result sets + +**Policy inheritance** works as follows: when you query applicable policies for a table, Polaris resolves +policies attached at the catalog level, the namespace level (including parent namespaces), and the table +level itself. Policies marked as `inheritable: true` propagate down the hierarchy. The `inherited` field in +the response indicates whether a policy was directly attached to the target or inherited from a parent scope. + +**Example response:** +```json +{ + "applicable-policies": [ + { + "name": "compaction-policy", + "policy-type": "system.data-compaction", + "inherited": true, + "namespace": ["analytics"], + "version": 3, + "inheritable": true, + "content": "{\"version\": \"2025-02-03\", \"enable\": true, \"config\": {\"target_file_size_bytes\": 134217728}}" + } + ] +} +``` + +### Managing Policies + +Beyond reading policies, external services with appropriate permissions can manage the full policy lifecycle: + +| Operation | Endpoint | Method | +|-----------|----------|--------| +| Create a policy | `/polaris/v1/{catalog}/namespaces/{ns}/policies` | `POST` | +| List policies in a namespace | `/polaris/v1/{catalog}/namespaces/{ns}/policies` | `GET` | +| Load a specific policy | `/polaris/v1/{catalog}/namespaces/{ns}/policies/{name}` | `GET` | +| Update a policy | `/polaris/v1/{catalog}/namespaces/{ns}/policies/{name}` | `PATCH` | +| Drop a policy | `/polaris/v1/{catalog}/namespaces/{ns}/policies/{name}` | `DELETE` | +| Attach a policy to a resource | `/polaris/v1/{catalog}/namespaces/{ns}/policies/{name}/mappings` | `PUT` | +| Detach a policy from a resource | `/polaris/v1/{catalog}/namespaces/{ns}/policies/{name}/mappings` | `POST` | + +### Example: External Compaction Service + +A typical pull-based integration for a data compaction service: + +``` +1. Authenticate + POST /v1/oauth/tokens → obtain bearer token + +2. Discover tables + GET /v1/{catalog}/namespaces/{ns}/tables → list of tables + +3. Fetch compaction policies for each table + GET /polaris/v1/{catalog}/applicable-policies + ?namespace={ns}&target-name={table}&policyType=system.data-compaction + +4. Execute compaction + If the policy has "enable": true, run compaction using the parameters + from the policy content (target_file_size_bytes, compaction_strategy, etc.) + +5. Repeat on schedule + Poll periodically or use an event-driven trigger to detect policy changes. +``` + +### Polling and Versioning + +Policies include a `version` field that increments on every update. External services can use this to detect +changes efficiently: + +1. Cache the last-seen version for each policy. +2. Periodically call the applicable-policies endpoint. +3. Compare versions — only re-process resources whose policy version has changed. + +For real-time change detection, combine the pull pattern with the [event listener pattern](#event-listeners--reacting-to-polaris-operations): +register a listener for `AFTER_ATTACH_POLICY` or `AFTER_UPDATE_POLICY` events and trigger a policy refresh in +the external service when relevant events fire. + +## Push Pattern — Sending Notifications to Polaris Review Comment: Thanks @jbonofre for the explanation. Could we add a clearer description of the push model in the context of the delegation service? I think that is one of the key points this doc is expected to clarify. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
