This is an automated email from the ASF dual-hosted git repository.
yufei pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/polaris.git
The following commit(s) were added to refs/heads/main by this push:
new a496a6fc3 Site: Add docs for catalog federation (#2761)
a496a6fc3 is described below
commit a496a6fc312ba69d69e95ee50ba541ccb077cfe6
Author: Yufei Gu <[email protected]>
AuthorDate: Fri Oct 10 17:06:26 2025 -0700
Site: Add docs for catalog federation (#2761)
---
.../content/in-dev/unreleased/federation/_index.md | 26 +++++
.../federation/hive-metastore-federation.md | 125 +++++++++++++++++++++
.../federation/iceberg-rest-federation.md | 71 ++++++++++++
3 files changed, 222 insertions(+)
diff --git a/site/content/in-dev/unreleased/federation/_index.md
b/site/content/in-dev/unreleased/federation/_index.md
new file mode 100644
index 000000000..e4fbe261a
--- /dev/null
+++ b/site/content/in-dev/unreleased/federation/_index.md
@@ -0,0 +1,26 @@
+---
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+title: Federation
+type: docs
+weight: 703
+---
+
+Guides for federating Polaris with existing metadata services. Expand this
section to select a
+specific integration.
diff --git
a/site/content/in-dev/unreleased/federation/hive-metastore-federation.md
b/site/content/in-dev/unreleased/federation/hive-metastore-federation.md
new file mode 100644
index 000000000..0d39a5e4a
--- /dev/null
+++ b/site/content/in-dev/unreleased/federation/hive-metastore-federation.md
@@ -0,0 +1,125 @@
+---
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+title: Hive Metastore Federation
+type: docs
+weight: 705
+---
+
+Polaris can federate catalog operations to an existing Hive Metastore (HMS).
This lets an external
+HMS remain the source of truth for table metadata while Polaris brokers
access, policies, and
+multi-engine connectivity.
+
+## Build-time enablement
+
+The Hive factory is packaged as an optional extension and is not baked into
default server builds.
+Include it when assembling the runtime or container images by setting the
`NonRESTCatalogs` Gradle
+property to include `HIVE` (and any other non-REST backends you need):
+
+```bash
+./gradlew :polaris-server:assemble :polaris-server:quarkusAppPartsBuild
--rerun \
+ -DNonRESTCatalogs=HIVE -Dquarkus.container-image.build=true
+```
+
+`runtime/server/build.gradle.kts` wires the extension in only when this flag
is present, so binaries
+built without it will reject Hive federation requests.
+
+## Runtime requirements
+
+- **Metastore connectivity:** Expose the HMS Thrift endpoint
(`thrift://host:port`) to the Polaris
+ deployment.
+- **Configuration discovery:** Iceberg’s `HiveCatalog` loads Hadoop/Hive
client settings from the
+ classpath. Provide `hive-site.xml` (and `core-site.xml` if needed) via
+ `HADOOP_CONF_DIR`/`HIVE_CONF_DIR` or an image layer.
+- **Authentication:** Hive federation only supports `IMPLICIT` authentication,
meaning Polaris uses
+ the operating-system or Kerberos identity of the running process (no stored
secrets). Ensure the
+ service principal is logged in or holds a valid keytab/TGT before starting
Polaris.
+- **Object storage role:** Configure
`polaris.service-identity.<realm>.aws-iam.*` (or the default
+ realm) so the server can assume the AWS role referenced by the catalog. The
IAM role must allow
+ STS access from the Polaris service identity and grant permissions to the
table locations.
+
+### Kerberos setup example
+
+If your Hive Metastore enforces Kerberos, stage the necessary configuration
alongside Polaris:
+
+```bash
+export KRB5_CONFIG=/etc/polaris/krb5.conf
+export HADOOP_CONF_DIR=/etc/polaris/hadoop-conf # contains hive-site.xml
with HMS principal
+export HADOOP_OPTS="-Djava.security.auth.login.config=/etc/polaris/jaas.conf"
+kinit -kt /etc/polaris/keytabs/polaris.keytab polaris/[email protected]
+```
+
+- `hive-site.xml` must define `hive.metastore.sasl.enabled=true`, the
metastore principal, and
+ client principal pattern (for example
`hive.metastore.client.kerberos.principal=polaris/_HOST@REALM`).
+- The JAAS entry (referenced by `java.security.auth.login.config`) should use
`useKeyTab=true` and
+ point to the same keytab shown above so the Polaris JVM can refresh
credentials automatically.
+- Keep the keytab readable solely by the Polaris service user; the implicit
authenticator consumes
+ the TGT at startup and for periodic renewal.
+
+## Creating a federated catalog
+
+Use the Management API (or the Python CLI) to create an external catalog whose
connection type is
+`HIVE`. The following request registers a catalog that proxies to an HMS
running on
+`thrift://hms.example.internal:9083`:
+
+```bash
+curl -X POST https://<polaris-host>/management/v1/catalogs \
+ -H "Authorization: Bearer $TOKEN" \
+ -H "Content-Type: application/json" \
+ -d '{
+ "type": "EXTERNAL",
+ "name": "analytics_hms",
+ "storageConfigInfo": {
+ "storageType": "S3",
+ "roleArn": "arn:aws:iam::123456789012:role/polaris-warehouse-access",
+ "region": "us-east-1"
+ },
+ "properties": { "default-base-location":
"s3://analytics-bucket/warehouse/" },
+ "connectionConfigInfo": {
+ "connectionType": "HIVE",
+ "uri": "thrift://hms.example.internal:9083",
+ "warehouse": "s3://analytics-bucket/warehouse/",
+ "authenticationParameters": { "authenticationType": "IMPLICIT" }
+ }
+ }'
+```
+
+Grant catalog roles to principal roles exactly as you would for internal
catalogs so engines can
+obtain tokens that authorize against the federated metadata.
+
+`default-base-location` is required; it tells Polaris and Iceberg where to
place new metadata files.
+`allowedLocations` is optional—supply it only when you want to restrict
writers to a specific set of
+prefixes. If your IAM trust policy requires an `externalId` or explicit
`userArn`, include those
+optional fields in `storageConfigInfo`. Polaris persists them and supplies
them when assuming the
+role cited by `roleArn` during metadata commits.
+
+## Limitations and operational notes
+
+- **Single identity:** Because only `IMPLICIT` authentication is permitted,
Polaris cannot mix
+ multiple Hive identities in a single deployment
(`HiveFederatedCatalogFactory` rejects other auth
+ types). Plan a deployment topology that aligns the Polaris process identity
with the target HMS.
+- **Generic tables:** The Hive extension exposes Iceberg tables registered in
HMS. Generic table
+ federation is not implemented
(`HiveFederatedCatalogFactory#createGenericCatalog` throws
+ `UnsupportedOperationException`).
+- **Configuration caching:** Atlas-style catalog failover and multi-HMS
routing are not yet handled;
+ Polaris initializes one `HiveCatalog` per connection and relies on the
underlying Iceberg client
+ for retries.
+
+With these constraints satisfied, Polaris can sit in front of an HMS so that
Iceberg tables managed
+there gain OAuth-protected, multi-engine access through the Polaris REST APIs.
diff --git
a/site/content/in-dev/unreleased/federation/iceberg-rest-federation.md
b/site/content/in-dev/unreleased/federation/iceberg-rest-federation.md
new file mode 100644
index 000000000..8318f4509
--- /dev/null
+++ b/site/content/in-dev/unreleased/federation/iceberg-rest-federation.md
@@ -0,0 +1,71 @@
+---
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+title: Iceberg REST Federation
+type: docs
+weight: 704
+---
+
+Polaris can federate an external Iceberg REST catalog (e.g., another Polaris
deployment, AWS Glue, or a custom Iceberg
+REST implementation), enabling a Polaris service to access table and view
entities managed by remote Iceberg REST Catalogs.
+
+## Runtime requirements
+
+- **REST endpoint:** The remote service must expose the Iceberg REST
specification. Configure
+ firewalls so Polaris can reach the base URI you provide in the connection
config.
+- **Authentication:** Polaris forwards requests using the credentials defined
in
+ `ConnectionConfigInfo.AuthenticationParameters`. OAuth2 client credentials,
bearer tokens, and AWS
+ SigV4 are supported; choose the scheme the remote service expects.
+
+## Creating a federated REST catalog
+
+The snippet below registers an external catalog that forwards to a remote
Polaris server using OAuth2
+client credentials. `iceberg-remote-catalog-name` is optional; supply it when
the remote server multiplexes
+multiple logical catalogs under one URI.
+
+```bash
+polaris catalogs create \
+ --type EXTERNAL \
+ --storage-type s3 \
+ --role-arn "arn:aws:iam::123456789012:role/polaris-warehouse-access" \
+ --default-base-location "s3://analytics-bucket/warehouse/" \
+ --catalog-connection-type iceberg-rest \
+ --iceberg-remote-catalog-name analytics \
+ --catalog-uri "https://remote-polaris.example.com/catalog/v1" \
+ --catalog-authentication-type OAUTH \
+ --catalog-token-uri
"https://remote-polaris.example.com/catalog/v1/oauth/tokens" \
+ --catalog-client-id "<remote-client-id>" \
+ --catalog-client-secret "<remote-client-secret>" \
+ --catalog-client-scopes "PRINCIPAL_ROLE:ALL" \
+ analytics_rest
+```
+
+Refer to the [CLI documentation](../command-line-interface.md#catalogs) for
details on alternative authentication types such as BEARER or SIGV4.
+
+Grant catalog roles to principal roles the same way you do for internal
catalogs so compute engines
+receive tokens with access to the federated namespace.
+
+## Operational notes
+
+- **Connectivity checks:** Polaris does not lazily probe the remote service;
catalog creation fails if
+ the REST endpoint is unreachable or authentication is rejected.
+- **Feature parity:** Federation exposes whatever table/namespace operations
the remote service
+ implements. Unsupported features return the remote error directly to callers.
+- **Generic tables:** The REST federation path currently surfaces Iceberg
tables only; generic table
+ federation is not implemented.