This is an automated email from the ASF dual-hosted git repository.
dimas pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/polaris.git
The following commit(s) were added to refs/heads/main by this push:
new 148c1a702 Add getting started with Apache Ozone (#2853)
148c1a702 is described below
commit 148c1a70220e82d5160af65db5e4d91c9eb5d6c3
Author: Dmitri Bourlatchkov <[email protected]>
AuthorDate: Fri Oct 31 14:12:04 2025 -0400
Add getting started with Apache Ozone (#2853)
* Add getting started with Apache Ozone
Use Apache Ozone as an example S3 impl. that does not have STS.
* fix typo in MinIO readme
---
getting-started/assets/cloud_providers/await-s3.sh | 42 +++++++
getting-started/minio/README.md | 2 +-
getting-started/{minio => ozone}/README.md | 51 ++++----
getting-started/ozone/docker-compose.yml | 131 +++++++++++++++++++++
4 files changed, 204 insertions(+), 22 deletions(-)
diff --git a/getting-started/assets/cloud_providers/await-s3.sh
b/getting-started/assets/cloud_providers/await-s3.sh
new file mode 100755
index 000000000..9045447bb
--- /dev/null
+++ b/getting-started/assets/cloud_providers/await-s3.sh
@@ -0,0 +1,42 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+ENDPOINT=$1
+# "invalidKey" in combination with SigV4 means "public" access
+KEY_ID=${2:-"invalidKey"}
+SECRET=${3:-"secret"}
+SLEEP=${4:-"1"}
+
+if [ -z "$ENDPOINT" ]; then
+ echo Endpoint must be provided
+ exit 1
+fi
+
+# Make up to 30 attempts to list buckets. Success means the service is
available
+for i in `seq 1 30`; do
+ echo "Listing buckets at $ENDPOINT"
+ curl --user "$KEY_ID:$SECRET" --aws-sigv4 "aws:amz:us-west-1:s3" $ENDPOINT
+ if [[ "$?" == "0" ]]; then
+ echo
+ echo "$ENDPOINT is available"
+ break
+ fi
+ echo "Sleeping $SLEEP ..."
+ sleep $SLEEP
+done
\ No newline at end of file
diff --git a/getting-started/minio/README.md b/getting-started/minio/README.md
index 65293c21b..18437a257 100644
--- a/getting-started/minio/README.md
+++ b/getting-started/minio/README.md
@@ -60,7 +60,7 @@ bin/spark-sql \
--conf spark.sql.catalog.polaris.client.region=irrelevant
```
-Note: `s3cr3t` is defined as the password for the `root` users in the
`docker-compose.yml` file.
+Note: `s3cr3t` is defined as the password for the `root` user in the
`docker-compose.yml` file.
Note: The `client.region` configuration is required for the AWS S3 client to
work, but it is not used in this example
since MinIO does not require a specific region.
diff --git a/getting-started/minio/README.md b/getting-started/ozone/README.md
similarity index 61%
copy from getting-started/minio/README.md
copy to getting-started/ozone/README.md
index 65293c21b..3f707a87a 100644
--- a/getting-started/minio/README.md
+++ b/getting-started/ozone/README.md
@@ -17,31 +17,24 @@
under the License.
-->
-# Getting Started with Apache Polaris and MinIO
+# Getting Started with Apache Polaris and Apache Ozone
## Overview
-This example uses MinIO as a storage provider with Polaris.
+This example uses [Apache Ozone](https://ozone.apache.org/) as a storage
provider with Polaris.
Spark is used as a query engine. This example assumes a local Spark
installation.
See the [Spark Notebooks Example](../spark/README.md) for a more advanced
Spark setup.
## Starting the Example
-1. Build the Polaris server image if it's not already present locally:
+Start the docker compose group by running the following command from the root
of the repository:
- ```shell
- ./gradlew \
- :polaris-server:assemble \
- :polaris-server:quarkusAppPartsBuild --rerun \
- -Dquarkus.container-image.build=true
- ```
-
-2. Start the docker compose group by running the following command from the
root of the repository:
+```shell
+docker compose -f getting-started/minio/docker-compose.yml up
+```
- ```shell
- docker compose -f getting-started/minio/docker-compose.yml up
- ```
+Note: this example pulls the `apache/polaris:latest` image, but assumes the
image is `1.2.0-incubating` or later.
## Connecting From Spark
@@ -55,15 +48,14 @@ bin/spark-sql \
--conf spark.sql.catalog.polaris.token-refresh-enabled=false \
--conf spark.sql.catalog.polaris.warehouse=quickstart_catalog \
--conf spark.sql.catalog.polaris.scope=PRINCIPAL_ROLE:ALL \
- --conf
spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation=vended-credentials
\
--conf spark.sql.catalog.polaris.credential=root:s3cr3t \
--conf spark.sql.catalog.polaris.client.region=irrelevant
```
-Note: `s3cr3t` is defined as the password for the `root` users in the
`docker-compose.yml` file.
+Note: `s3cr3t` is defined as the password for the `root` user in the
`docker-compose.yml` file.
-Note: The `client.region` configuration is required for the AWS S3 client to
work, but it is not used in this example
-since MinIO does not require a specific region.
+Note: The `client.region` configuration is required for the AWS S3 client to
work, but it is not used in
+this example since Ozone does not require a specific region.
## Running Queries
@@ -84,11 +76,28 @@ abc
Time taken: 0.579 seconds, Fetched 1 row(s)
```
-## MinIO Endpoints
+## Lack of Credential Vending
+
+Notice that the Spark configuration does not contain a
`X-Iceberg-Access-Delegation` header.
+This is because Ozone does not support the STS API and consequently cannot
produce session
+credentials to be vended to Polaris clients.
+
+The lack of STS API is represented in the Catalog storage configuration by the
+`stsUnavailable=false` property.
+
+## S3 Credentials
+
+In this example Ozone does not require credentials for accessing its S3 API.
Therefore, neither
+Polaris, not Spark use any S3 access keys.
+
+If Ozone were configured to require credentials, Spark and Polaris would have
to their own separate
+S3 access key / secret properties because credential vending is not possible
with Ozone 2.0.0.
+
+## S3 Endpoints
Note that the catalog configuration defined in the `docker-compose.yml`
contains
different endpoints for the Polaris Server and the client (Spark).
Specifically,
-the client endpoint is `http://localhost:9000`, but `endpointInternal` is
`http://minio:9000`.
+the client endpoint is `http://localhost:9878`, but `endpointInternal` is
`http://ozone-s3g:9878`.
This is necessary because clients running on `localhost` do not normally see
service
-names (such as `minio`) that are internal to the docker compose environment.
+names (such as `ozone-s3g`) that are internal to the docker compose
environment.
diff --git a/getting-started/ozone/docker-compose.yml
b/getting-started/ozone/docker-compose.yml
new file mode 100644
index 000000000..9b4d49173
--- /dev/null
+++ b/getting-started/ozone/docker-compose.yml
@@ -0,0 +1,131 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+services:
+
+ ozone-datanode:
+ image: &ozone-image apache/ozone:2.0.0
+ ports:
+ - 9864
+ command: ["ozone","datanode"]
+ environment:
+ &ozone-common-config
+ OZONE-SITE.XML_hdds.datanode.dir: "/data/hdds"
+ OZONE-SITE.XML_ozone.metadata.dirs: "/data/metadata"
+ OZONE-SITE.XML_ozone.om.address: "ozone-om"
+ OZONE-SITE.XML_ozone.om.http-address: "ozone-om:9874"
+ OZONE-SITE.XML_ozone.recon.address: "ozone-recon:9891"
+ OZONE-SITE.XML_ozone.recon.db.dir: "/data/metadata/recon"
+ OZONE-SITE.XML_ozone.replication: "1"
+ OZONE-SITE.XML_ozone.scm.block.client.address: "ozone-scm"
+ OZONE-SITE.XML_ozone.scm.client.address: "ozone-scm"
+ OZONE-SITE.XML_ozone.scm.datanode.id.dir: "/data/metadata"
+ OZONE-SITE.XML_ozone.scm.names: "ozone-scm"
+ no_proxy: "ozone-om,ozone-recon,ozone-scm,ozone-s3g,localhost,127.0.0.1"
+ ozone-om:
+ image: *ozone-image
+ ports:
+ - 9874:9874
+ environment:
+ <<: *ozone-common-config
+ CORE-SITE.XML_hadoop.proxyuser.hadoop.hosts: "*"
+ CORE-SITE.XML_hadoop.proxyuser.hadoop.groups: "*"
+ ENSURE_OM_INITIALIZED: /data/metadata/om/current/VERSION
+ WAITFOR: ozone-scm:9876
+ command: ["ozone","om"]
+ ozone-scm:
+ image: *ozone-image
+ ports:
+ - 9876:9876
+ environment:
+ <<: *ozone-common-config
+ ENSURE_SCM_INITIALIZED: /data/metadata/scm/current/VERSION
+ command: ["ozone","scm"]
+ ozone-recon:
+ image: *ozone-image
+ ports:
+ - 9888:9888
+ environment:
+ <<: *ozone-common-config
+ command: ["ozone","recon"]
+ ozone-s3g:
+ image: *ozone-image
+ ports:
+ - 9878:9878
+ environment:
+ <<: *ozone-common-config
+ command: ["ozone","s3g"]
+
+ polaris:
+ image: apache/polaris:latest
+ ports:
+ # API port
+ - "8181:8181"
+ # Optional, allows attaching a debugger to the Polaris JVM
+ - "5005:5005"
+ environment:
+ JAVA_DEBUG: true
+ JAVA_DEBUG_PORT: "*:5005"
+ AWS_REGION: us-west-2
+ AWS_ACCESS_KEY_ID: minio_root
+ AWS_SECRET_ACCESS_KEY: m1n1opwd
+ POLARIS_BOOTSTRAP_CREDENTIALS: POLARIS,root,s3cr3t
+ polaris.realm-context.realms: POLARIS
+ quarkus.otel.sdk.disabled: "true"
+ healthcheck:
+ test: ["CMD", "curl", "http://localhost:8182/q/health"]
+ interval: 2s
+ timeout: 10s
+ retries: 10
+ start_period: 10s
+
+ polaris-setup:
+ image: alpine/curl
+ depends_on:
+ polaris:
+ condition: service_healthy
+ environment:
+ - CLIENT_ID=root
+ - CLIENT_SECRET=s3cr3t
+ volumes:
+ - ../assets/:/assets/
+ entrypoint: "/bin/sh"
+ command:
+ - "-c"
+ - >-
+ /assets/cloud_providers/await-s3.sh http://ozone-s3g:9878/ ;
+ source /assets/polaris/obtain-token.sh;
+ echo Creating bucket...;
+ curl -X PUT --user "invalidKey:secret" --aws-sigv4
"aws:amz:us-west-1:s3" \
+ http://ozone-s3g:9878/bucket123 ;
+ echo Creating catalog...;
+ export STORAGE_CONFIG_INFO='{"storageType":"S3",
+ "endpoint":"http://localhost:9878",
+ "endpointInternal":"http://ozone-s3g:9878",
+ "stsUnavailable":true,
+ "pathStyleAccess":true}';
+ export STORAGE_LOCATION='s3://bucket123';
+ /assets/polaris/create-catalog.sh POLARIS $$TOKEN;
+ echo Extra grants...;
+ curl -H "Authorization: Bearer $$TOKEN" -H 'Content-Type:
application/json' \
+ -X PUT \
+
http://polaris:8181/api/management/v1/catalogs/quickstart_catalog/catalog-roles/catalog_admin/grants
\
+ -d '{"type":"catalog", "privilege":"CATALOG_MANAGE_CONTENT"}';
+ echo Done.;
+