flyrain commented on code in PR #2761: URL: https://github.com/apache/polaris/pull/2761#discussion_r2407833513
########## site/content/in-dev/unreleased/federation/hive-metastore-federation.md: ########## @@ -0,0 +1,125 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Hive Metastore Federation +type: docs +weight: 705 +--- + +Polaris can federate catalog operations to an existing Hive Metastore (HMS). This lets an external +HMS remain the source of truth for table metadata while Polaris brokers access, policies, and +multi-engine connectivity. + +## Build-time enablement + +The Hive factory is packaged as an optional extension and is not baked into default server builds. +Include it when assembling the runtime or container images by setting the `NonRESTCatalogs` Gradle +property to include `HIVE` (and any other non-REST backends you need): + +```bash +./gradlew :polaris-server:assemble :polaris-server:quarkusAppPartsBuild --rerun \ + -DNonRESTCatalogs=HIVE -Dquarkus.container-image.build=true +``` + +`runtime/server/build.gradle.kts` wires the extension in only when this flag is present, so binaries +built without it will reject Hive federation requests. + +## Runtime requirements + +- **Metastore connectivity:** Expose the HMS Thrift endpoint (`thrift://host:port`) to the Polaris + deployment. +- **Configuration discovery:** Iceberg’s `HiveCatalog` loads Hadoop/Hive client settings from the + classpath. Provide `hive-site.xml` (and `core-site.xml` if needed) via + `HADOOP_CONF_DIR`/`HIVE_CONF_DIR` or an image layer. +- **Authentication:** Hive federation only supports `IMPLICIT` authentication, meaning Polaris uses + the operating-system or Kerberos identity of the running process (no stored secrets). Ensure the + service principal is logged in or holds a valid keytab/TGT before starting Polaris. +- **Object storage role:** Configure `polaris.service-identity.<realm>.aws-iam.*` (or the default + realm) so the server can assume the AWS role referenced by the catalog. The IAM role must allow + STS access from the Polaris service identity and grant permissions to the table locations. + +### Kerberos setup example + +If your Hive Metastore enforces Kerberos, stage the necessary configuration alongside Polaris: + +```bash +export KRB5_CONFIG=/etc/polaris/krb5.conf +export HADOOP_CONF_DIR=/etc/polaris/hadoop-conf # contains hive-site.xml with HMS principal +export HADOOP_OPTS="-Djava.security.auth.login.config=/etc/polaris/jaas.conf" +kinit -kt /etc/polaris/keytabs/polaris.keytab polaris/[email protected] +``` + +- `hive-site.xml` must define `hive.metastore.sasl.enabled=true`, the metastore principal, and + client principal pattern (for example `hive.metastore.client.kerberos.principal=polaris/_HOST@REALM`). +- The JAAS entry (referenced by `java.security.auth.login.config`) should use `useKeyTab=true` and + point to the same keytab shown above so the Polaris JVM can refresh credentials automatically. +- Keep the keytab readable solely by the Polaris service user; the implicit authenticator consumes + the TGT at startup and for periodic renewal. + +## Creating a federated catalog + +Use the Management API (or the Python CLI) to create an external catalog whose connection type is +`HIVE`. The following request registers a catalog that proxies to an HMS running on +`thrift://hms.example.internal:9083`: + +```bash +curl -X POST https://<polaris-host>/management/v1/catalogs \ + -H "Authorization: Bearer $TOKEN" \ + -H "Content-Type: application/json" \ + -d '{ + "type": "EXTERNAL", + "name": "analytics_hms", + "storageConfigInfo": { + "storageType": "S3", + "roleArn": "arn:aws:iam::123456789012:role/polaris-warehouse-access", + "region": "us-east-1" + }, + "properties": { "default-base-location": "s3://analytics-bucket/warehouse/" }, + "connectionConfigInfo": { + "connectionType": "HIVE", + "uri": "thrift://hms.example.internal:9083", + "warehouse": "s3://analytics-bucket/warehouse/", + "authenticationParameters": { "authenticationType": "IMPLICIT" } + } + }' Review Comment: Thanks @XJDKC ! I can add the cli version as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
