dimas-b commented on code in PR #3990:
URL: https://github.com/apache/polaris/pull/3990#discussion_r2925592551


##########
site/content/in-dev/unreleased/delegation-service.md:
##########
@@ -0,0 +1,733 @@
+---
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+title: Delegation Service
+type: docs
+weight: 430
+---
+
+The Polaris Delegation Service enables external services — compute engines, 
orchestrators, policy enforcement points, and
+maintenance tools — to interact with Polaris as a centralized governance and 
metadata authority.
+
+External services can integrate with Polaris using three complementary 
patterns:
+
+- **Pull**: The external service queries Polaris on demand to fetch policies, 
configuration, or metadata.
+- **Push**: The external service sends notifications to Polaris to inform it 
of changes happening externally.
+- **Event**: Polaris emits internal events at every operation boundary, 
enabling custom listeners to bridge Polaris with external systems reactively.
+
+## Architecture Overview
+
+```
+ ┌──────────────────────┐         ┌─────────────────────┐
+ │   External Service   │         │   External Catalog   │
+ │  (Spark, Trino,      │         │  (Hive, Glue,       │
+ │   Flink, custom)     │         │   custom REST)       │
+ └──────┬───────────────┘         └──────┬──────────────┘
+        │                                │
+        │  PULL: query policies,         │  PUSH: send table
+        │  metadata, configuration       │  change notifications
+        │                                │
+        ▼                                ▼
+ ┌──────────────────────────────────────────────────────┐
+ │                  Apache Polaris                       │
+ │                                                      │
+ │  ┌──────────┐  ┌─────────────┐  ┌────────────────┐  │
+ │  │ Policy   │  │ Notification│  │ Event Listener │  │
+ │  │ API      │  │ API         │  │ Framework      │  │
+ │  └──────────┘  └─────────────┘  └───────┬────────┘  │
+ │                                         │           │
+ └─────────────────────────────────────────┼───────────┘
+                                           │
+                                    EVENT: BEFORE/AFTER
+                                    hooks on all operations
+                                           │
+                                           ▼
+                                 ┌──────────────────┐
+                                 │ Custom Listeners  │
+                                 │ (Kafka, webhooks, │
+                                 │  CloudWatch, ...) │
+                                 └──────────────────┘
+```
+
+## Pull vs Push — Choosing the Right Integration Pattern
+
+The pull and push patterns serve different purposes and have different 
trade-offs. Understanding when to use
+each — or combine them — is key to building a robust integration with Polaris.
+
+### Pull: Service-Driven Queries
+
+In the pull model, the external service decides **when** and **what** to 
query. This gives the service full
+control over timing and request volume.
+
+**Best suited for:**
+- Compute engines that need policies or metadata at query planning time (e.g., 
Spark, Trino, Flink)
+- Maintenance services that run on a fixed schedule (e.g., a compaction job 
that runs every hour)
+- Services that need to discover resources dynamically (listing namespaces and 
tables)
+- Scenarios where the external service is the source of truth for when work 
should happen
+
+**Advantages:**
+- Simple to implement — standard REST API calls
+- No infrastructure needed beyond an HTTP client
+- Service controls its own polling frequency and backoff strategy
+- Naturally resilient — a missed poll just delays the next check, nothing is 
lost
+
+**Considerations:**
+- Introduces polling latency — changes are not detected until the next poll 
cycle
+- Generates load on Polaris proportional to the number of services and their 
polling frequency
+- Use the `version` field on policies to avoid redundant reprocessing (see 
[Polling and Versioning](#polling-and-versioning))
+
+### Push: External-Catalog-Driven Notifications
+
+In the push model, the external system sends notifications to Polaris the 
moment a change occurs. Polaris
+does not poll the external system — it reacts to incoming notifications.
+
+**Best suited for:**
+- External catalogs (Hive Metastore, AWS Glue, custom REST catalogs) that 
manage table lifecycle outside Polaris
+- Scenarios where Polaris must mirror metadata from another system with 
minimal delay
+- Multi-catalog environments where Polaris is the unified governance layer but 
does not own all table metadata
+
+**Advantages:**
+- Near-real-time synchronization — Polaris learns about changes as soon as 
they happen
+- No polling overhead on either side
+- Timestamp-based ordering ensures consistency even under concurrent updates
+- `VALIDATE` pre-checks prevent wasted work when permissions or locations are 
misconfigured
+
+**Considerations:**
+- Requires the external system to implement notification sending logic (e.g., 
a Hive hook or a CDC pipeline)
+- The caller must manage monotonically increasing timestamps and handle `409 
Conflict` rejections
+- Only applies to table metadata synchronization — not for querying policies 
or configuration
+
+### Combining Pull and Push with Event Listeners
+
+For the most responsive integrations, combine pull and push with the event 
listener pattern:
+
+```
+ ┌──────────────┐   1. PUSH notification    ┌────────────┐
+ │ External     │ ─────────────────────────→ │  Polaris   │
+ │ Catalog      │                            │            │
+ └──────────────┘                            │  2. Fires  │
+                                             │  AFTER_    │
+                                             │  SEND_     │
+                                             │  NOTIF.    │
+                                             └─────┬──────┘
+                                                   │
+                              3. Event listener     │
+                              forwards to Kafka     │
+                                                   ▼
+ ┌──────────────┐   4. Consumes event        ┌────────────┐
+ │ Maintenance  │ ←──────────────────────────│   Kafka    │
+ │ Service      │                            └────────────┘
+ │              │   5. PULL: fetch policies
+ │              │ ─────────────────────────→  Polaris
+ └──────────────┘
+```
+
+In this pattern:
+1. An external catalog **pushes** metadata changes to Polaris via 
notifications.
+2. Polaris fires `AFTER_SEND_NOTIFICATION` events, which a custom listener 
forwards to a message queue.
+3. A maintenance service consumes the event and **pulls** the applicable 
policies to decide what action to take.
+
+This eliminates polling latency while keeping the maintenance service 
decoupled from the notification flow.
+
+## Access Delegation Modes
+
+When external services interact with Polaris to access table data, Polaris 
supports two credential
+delegation mechanisms via the `X-Iceberg-Access-Delegation` HTTP header, as 
defined by the Iceberg
+REST API specification:
+
+| Mode | Header Value | Description |
+|------|-------------|-------------|
+| **Vended Credentials** | `vended-credentials` | Polaris generates 
short-lived, scoped credentials (e.g., AWS STS tokens) and sends them to the 
client. The client uses these credentials directly to access storage. |
+| **Remote Signing** | `remote-signing` | The client sends data access 
requests back to Polaris, which signs them on behalf of the client. The client 
never receives raw storage credentials. |

Review Comment:
   maybe add `WIP` for now?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to