dimas-b commented on code in PR #3990: URL: https://github.com/apache/polaris/pull/3990#discussion_r2925589349
########## site/content/in-dev/unreleased/delegation-service.md: ########## @@ -0,0 +1,733 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Delegation Service +type: docs +weight: 430 +--- + +The Polaris Delegation Service enables external services — compute engines, orchestrators, policy enforcement points, and +maintenance tools — to interact with Polaris as a centralized governance and metadata authority. + +External services can integrate with Polaris using three complementary patterns: + +- **Pull**: The external service queries Polaris on demand to fetch policies, configuration, or metadata. +- **Push**: The external service sends notifications to Polaris to inform it of changes happening externally. +- **Event**: Polaris emits internal events at every operation boundary, enabling custom listeners to bridge Polaris with external systems reactively. + +## Architecture Overview + +``` + ┌──────────────────────┐ ┌─────────────────────┐ + │ External Service │ │ External Catalog │ + │ (Spark, Trino, │ │ (Hive, Glue, │ + │ Flink, custom) │ │ custom REST) │ + └──────┬───────────────┘ └──────┬──────────────┘ + │ │ + │ PULL: query policies, │ PUSH: send table + │ metadata, configuration │ change notifications + │ │ + ▼ ▼ + ┌──────────────────────────────────────────────────────┐ + │ Apache Polaris │ + │ │ + │ ┌──────────┐ ┌─────────────┐ ┌────────────────┐ │ + │ │ Policy │ │ Notification│ │ Event Listener │ │ + │ │ API │ │ API │ │ Framework │ │ + │ └──────────┘ └─────────────┘ └───────┬────────┘ │ + │ │ │ + └─────────────────────────────────────────┼───────────┘ + │ + EVENT: BEFORE/AFTER + hooks on all operations + │ + ▼ + ┌──────────────────┐ + │ Custom Listeners │ + │ (Kafka, webhooks, │ + │ CloudWatch, ...) │ + └──────────────────┘ +``` + +## Pull vs Push — Choosing the Right Integration Pattern + +The pull and push patterns serve different purposes and have different trade-offs. Understanding when to use +each — or combine them — is key to building a robust integration with Polaris. + +### Pull: Service-Driven Queries + +In the pull model, the external service decides **when** and **what** to query. This gives the service full +control over timing and request volume. + +**Best suited for:** +- Compute engines that need policies or metadata at query planning time (e.g., Spark, Trino, Flink) +- Maintenance services that run on a fixed schedule (e.g., a compaction job that runs every hour) +- Services that need to discover resources dynamically (listing namespaces and tables) +- Scenarios where the external service is the source of truth for when work should happen + +**Advantages:** +- Simple to implement — standard REST API calls +- No infrastructure needed beyond an HTTP client +- Service controls its own polling frequency and backoff strategy +- Naturally resilient — a missed poll just delays the next check, nothing is lost + +**Considerations:** +- Introduces polling latency — changes are not detected until the next poll cycle +- Generates load on Polaris proportional to the number of services and their polling frequency +- Use the `version` field on policies to avoid redundant reprocessing (see [Polling and Versioning](#polling-and-versioning)) + +### Push: External-Catalog-Driven Notifications + +In the push model, the external system sends notifications to Polaris the moment a change occurs. Polaris +does not poll the external system — it reacts to incoming notifications. + +**Best suited for:** +- External catalogs (Hive Metastore, AWS Glue, custom REST catalogs) that manage table lifecycle outside Polaris +- Scenarios where Polaris must mirror metadata from another system with minimal delay +- Multi-catalog environments where Polaris is the unified governance layer but does not own all table metadata + +**Advantages:** +- Near-real-time synchronization — Polaris learns about changes as soon as they happen +- No polling overhead on either side +- Timestamp-based ordering ensures consistency even under concurrent updates +- `VALIDATE` pre-checks prevent wasted work when permissions or locations are misconfigured + +**Considerations:** +- Requires the external system to implement notification sending logic (e.g., a Hive hook or a CDC pipeline) +- The caller must manage monotonically increasing timestamps and handle `409 Conflict` rejections +- Only applies to table metadata synchronization — not for querying policies or configuration Review Comment: ok :slightly_smiling_face: but it might be best to rephrase since users will read it and make assumptions followed by questions, I'm sure :sweat_smile: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
