This is an automated email from the ASF dual-hosted git repository. jbonofre pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/polaris.git
The following commit(s) were added to refs/heads/main by this push: new 7f9ba404 Publish 0.9.0 documentation (#1175) 7f9ba404 is described below commit 7f9ba4041ec9c58ec8b92869b6f0fe0055f68fe5 Author: JB Onofré <jbono...@apache.org> AuthorDate: Fri Mar 14 16:37:46 2025 +0100 Publish 0.9.0 documentation (#1175) --- site/content/in-dev/0.9.0/_index.md | 38 + site/content/in-dev/0.9.0/access-control.md | 193 ++++ .../content/in-dev/0.9.0/command-line-interface.md | 1088 ++++++++++++++++++++ .../0.9.0/configuring-polaris-for-production.md | 131 +++ site/content/in-dev/0.9.0/entities.md | 89 ++ site/content/in-dev/0.9.0/metastores.md | 112 ++ site/content/in-dev/0.9.0/overview.md | 215 ++++ .../in-dev/0.9.0/polaris-management-service.md | 27 + site/content/in-dev/0.9.0/quickstart.md | 332 ++++++ site/content/in-dev/0.9.0/rest-catalog-open-api.md | 27 + site/hugo.yaml | 5 +- 11 files changed, 2256 insertions(+), 1 deletion(-) diff --git a/site/content/in-dev/0.9.0/_index.md b/site/content/in-dev/0.9.0/_index.md new file mode 100644 index 00000000..f3ce53c6 --- /dev/null +++ b/site/content/in-dev/0.9.0/_index.md @@ -0,0 +1,38 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +linkTitle: '0.9.0' +title: '0.9.0' +type: docs +weight: 200 +params: + top_hidden: true + show_page_toc: false +cascade: + type: docs + params: + show_page_toc: true +# This file will NOT be copied into a new release's versioned docs folder. +--- + +Check out the [Quick Start]({{% ref "quickstart" %}}) page to get started. + +<!-- +Testing the `releaseVersion` shortcode here: version is: {{< releaseVersion >}} +--> diff --git a/site/content/in-dev/0.9.0/access-control.md b/site/content/in-dev/0.9.0/access-control.md new file mode 100644 index 00000000..7c4c9cc8 --- /dev/null +++ b/site/content/in-dev/0.9.0/access-control.md @@ -0,0 +1,193 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +Title: Access Control +type: docs +weight: 500 +--- + +This section provides information about how access control works for Apache Polaris (Incubating). + +Polaris uses a role-based access control (RBAC) model in which the Polaris administrator assigns access privileges to catalog roles +and then grants access to resources to service principals by assigning catalog roles to principal roles. + +These are the key concepts to understanding access control in Polaris: + +- **Securable object** +- **Principal role** +- **Catalog role** +- **Privilege** + +## Securable object + +A securable object is an object to which access can be granted. Polaris +has the following securable objects: + +- Catalog +- Namespace +- Iceberg table +- View + +## Principal role + +A principal role is a resource in Polaris that you can use to logically group Polaris service principals together and grant privileges on +securable objects. + +Polaris supports a many-to-one relationship between service principals and principal roles. For example, to grant the same privileges to +multiple service principals, you can grant a single principal role to those service principals. A service principal can be granted one +principal role. When registering a service connection, the Polaris administrator specifies the principal role that is granted to the +service principal. + +You don't grant privileges directly to a principal role. Instead, you configure object permissions at the catalog role level, and then grant +catalog roles to a principal role. + +The following table shows examples of principal roles that you might configure in Polaris: + +| Principal role name | Description | +| -----------------------| ----------- | +| Data_engineer | A role that is granted to multiple service principals for running data engineering jobs. | +| Data_scientist | A role that is granted to multiple service principals for running data science or AI jobs. | + +## Catalog role + +A catalog role belongs to a particular catalog resource in Polaris and specifies a set of permissions for actions on the catalog or objects +in the catalog, such as catalog namespaces or tables. You can create one or more catalog roles for a catalog. + +You grant privileges to a catalog role and then grant the catalog role to a principal role to bestow the privileges to one or more service +principals. + +> **Note** +> +> If you update the privileges bestowed to a service principal, the updates won't take effect for up to one hour. This means that if you +> revoke or grant some privileges for a catalog, the updated privileges won't take effect on any service principal with access to that catalog +> for up to one hour. + +Polaris also supports a many-to-many relationship between catalog roles and principal roles. You can grant the same catalog role to one or more +principal roles. Likewise, a principal role can be granted to one or more catalog roles. + +The following table displays examples of catalog roles that you might +configure in Polaris: + +| Example Catalog role | Description | +| -----------------------| ----------- | +| Catalog administrators | A role that has been granted multiple privileges to emulate full access to the catalog.<br /><br />Principal roles that have been granted this role are permitted to create, alter, read, write, and drop tables in the catalog. | +| Catalog readers | A role that has been granted read-only privileges to tables in the catalog.<br /><br />Principal roles that have been granted this role are allowed to read from tables in the catalog. | +| Catalog contributor | A role that has been granted read and write access privileges to all tables that belong to the catalog.<br /><br />Principal roles that have been granted this role are allowed to perform read and write operations on tables in the catalog. | + +## RBAC model + +The following diagram illustrates the RBAC model used by Polaris. For each catalog, the Polaris administrator assigns access +privileges to catalog roles and then grants service principals access to resources by assigning catalog roles to principal roles. Polaris +supports a many-to-one relationship between service principals and principal roles. + + + +## Access control privileges + +This section describes the privileges that are available in the Polaris access control model. Privileges are granted to catalog roles, catalog +roles are granted to principal roles, and principal roles are granted to service principals to specify the operations that service principals can +perform on objects in Polaris. + +> **Important** +> +> You can only grant privileges at the catalog level. Fine-grained access controls are not available. For example, you can grant read +> privileges to all tables in a catalog but not to an individual table in the catalog. + +To grant the full set of privileges (drop, list, read, write, etc.) on an object, you can use the *full privilege* option. + +### Table privileges + +| Privilege | Description | +| --------- | ----------- | +| TABLE_CREATE | Enables registering a table with the catalog. | +| TABLE_DROP | Enables dropping a table from the catalog. | +| TABLE_LIST | Enables listing any tables in the catalog. | +| TABLE_READ_PROPERTIES | Enables reading [properties](https://iceberg.apache.org/docs/nightly/configuration/#table-properties) of the table. | +| TABLE_WRITE_PROPERTIES | Enables configuring [properties](https://iceberg.apache.org/docs/nightly/configuration/#table-properties) for the table. | +| TABLE_READ_DATA | Enables reading data from the table by receiving short-lived read-only storage credentials from the catalog. | +| TABLE_WRITE_DATA | Enables writing data to the table by receiving short-lived read+write storage credentials from the catalog. | +| TABLE_FULL_METADATA | Grants all table privileges, except TABLE_READ_DATA and TABLE_WRITE_DATA, which need to be granted individually. | + +### View privileges + +| Privilege | Description | +| --------- | ----------- | +| VIEW_CREATE | Enables registering a view with the catalog. | +| VIEW_DROP | Enables dropping a view from the catalog. | +| VIEW_LIST | Enables listing any views in the catalog. | +| VIEW_READ_PROPERTIES | Enables reading all the view properties. | +| VIEW_WRITE_PROPERTIES | Enables configuring view properties. | +| VIEW_FULL_METADATA | Grants all view privileges. | + +### Namespace privileges + +| Privilege | Description | +| --------- | ----------- | +| NAMESPACE_CREATE | Enables creating a namespace in a catalog. | +| NAMESPACE_DROP | Enables dropping the namespace from the catalog. | +| NAMESPACE_LIST | Enables listing any object in the namespace, including nested namespaces and tables. | +| NAMESPACE_READ_PROPERTIES | Enables reading all the namespace properties. | +| NAMESPACE_WRITE_PROPERTIES | Enables configuring namespace properties. | +| NAMESPACE_FULL_METADATA | Grants all namespace privileges. | + +### Catalog privileges + +| Privilege | Description | +| -----------------------| ----------- | +| CATALOG_MANAGE_ACCESS | Includes the ability to grant or revoke privileges on objects in a catalog to catalog roles, and the ability to grant or revoke catalog roles to or from principal roles. | +| CATALOG_MANAGE_CONTENT | Enables full management of content for the catalog. This privilege encompasses the following privileges:<ul><li>CATALOG_MANAGE_METADATA</li><li>TABLE_FULL_METADATA</li><li>NAMESPACE_FULL_METADATA</li><li>VIEW_FULL_METADATA</li><li>TABLE_WRITE_DATA</li><li>TABLE_READ_DATA</li><li>CATALOG_READ_PROPERTIES</li><li>CATALOG_WRITE_PROPERTIES</li></ul> | +| CATALOG_MANAGE_METADATA | Enables full management of the catalog, catalog roles, namespaces, and tables. | +| CATALOG_READ_PROPERTIES | Enables listing catalogs and reading properties of the catalog. | +| CATALOG_WRITE_PROPERTIES | Enables configuring catalog properties. | + +## RBAC example + +The following diagram illustrates how RBAC works in Polaris and +includes the following users: + +- **Alice:** A service admin who signs up for Polaris. Alice can + create service principals. She can also create catalogs and + namespaces and configure access control for Polaris resources. + +- **Bob:** A data engineer who uses Apache Spark™ to + interact with Polaris. + + - Alice has created a service principal for Bob. It has been + granted the Data_engineer principal role, which in turn has been + granted the following catalog roles: Catalog contributor and + Data administrator (for both the Silver and Gold zone catalogs + in the following diagram). + + - The Catalog contributor role grants permission to create + namespaces and tables in the Bronze zone catalog. + + - The Data administrator roles grant full administrative rights to + the Silver zone catalog and Gold zone catalog. + +- **Mark:** A data scientist who uses trains models with data managed + by Polaris. + + - Alice has created a service principal for Mark. It has been + granted the Data_scientist principal role, which in turn has + been granted the catalog role named Catalog reader. + + - The Catalog reader role grants read-only access for a catalog + named Gold zone catalog. + + diff --git a/site/content/in-dev/0.9.0/command-line-interface.md b/site/content/in-dev/0.9.0/command-line-interface.md new file mode 100644 index 00000000..4a26ed4b --- /dev/null +++ b/site/content/in-dev/0.9.0/command-line-interface.md @@ -0,0 +1,1088 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +linkTitle: Command Line Interface +title: Apache Polaris (Incubating) CLI +type: docs +weight: 300 +--- + +In order to help administrators quickly set up and manage their Polaris server, Polaris provides a simple command-line interface (CLI) for common tasks. + +The basic syntax of the Polaris CLI is outlined below: + +``` +polaris [options] COMMAND ... + +options: +--host +--port +--client-id +--client-secret +``` + +`COMMAND` must be one of the following: +1. catalogs +2. principals +3. principal-roles +4. catalog-roles +5. namespaces +6. privileges + +Each _command_ supports several _subcommands_, and some _subcommands_ have _actions_ that come after the subcommand in turn. Finally, _arguments_ follow to form a full invocation. Within a set of named arguments at the end of an invocation ordering is generally not important. Many invocations also have a required positional argument of the type that the _command_ refers to. Again, the ordering of this positional argument relative to named arguments is not important. + +Some example full invocations: + +``` +polaris principals list +polaris catalogs delete some_catalog_name +polaris catalogs update --property foo=bar some_other_catalog +polaris catalogs update another_catalog --property k=v +polaris privileges namespace grant --namespace some.schema --catalog fourth_catalog --catalog-role some_catalog_role TABLE_READ_DATA +``` + +### Authentication + +As outlined above, the Polaris CLI may take credentials using the `--client-id` and `--client-secret` options. For example: + +``` +polaris --client-id 4b5ed1ca908c3cc2 --client-secret 07ea8e4edefb9a9e57c247e8d1a4f51c principals ... +``` + +If `--client-id` and `--client-secret` are not provided, the Polaris CLI will try to read the client ID and client secret from environment variables called `CLIENT_ID` and `CLIENT_SECRET` respectively. If these flags are not provided and the environment variables are not set, the CLI will fail. + +If the `--host` and `--port` options are not provided, the CLI will default to communicating with `localhost:8181`. + +### PATH + +These examples assume the Polaris CLI is on the PATH and so can be invoked just by the command `polaris`. You can add the CLI to your PATH environment variable with a command like the following: + +``` +export PATH="~/polaris:$PATH" +``` + +Alternatively, you can run the CLI by providing a path to it, such as with the following invocation: + +``` +~/polaris principals list +``` + +## Commands + +Each of the commands `catalogs`, `principals`, `principal-roles`, `catalog-roles`, and `privileges` is used to manage a different type of entity within Polaris. + +To find details on the options that can be provided to a particular command or subcommand ad-hoc, you may wish to use the `--help` flag. For example: + +``` +polaris catalogs --help +polaris principals create --help +``` + +### catalogs + +The `catalogs` command is used to create, discover, and otherwise manage catalogs within Polaris. + +`catalogs` supports the following subcommands: + +1. create +2. delete +3. get +4. list +5. update + +#### create + +The `create` subcommand is used to create a catalog. + +``` +input: polaris catalogs create --help +options: + create + Named arguments: + --type The type of catalog to create in [INTERNAL, EXTERNAL]. INTERNAL by default. + --storage-type (Required) The type of storage to use for the catalog + --default-base-location (Required) Default base location of the catalog + --allowed-location An allowed location for files tracked by the catalog. Multiple locations can be provided by specifying this option more than once. + --role-arn (Required for S3) A role ARN to use when connecting to S3 + --external-id (Only for S3) The external ID to use when connecting to S3 + --tenant-id (Required for Azure) A tenant ID to use when connecting to Azure Storage + --multi-tenant-app-name (Only for Azure) The app name to use when connecting to Azure Storage + --consent-url (Only for Azure) A consent URL granting permissions for the Azure Storage location + --service-account (Only for GCS) The service account to use when connecting to GCS + --remote-url (For external catalogs) The remote URL to use + --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once + Positional arguments: + catalog +``` + +##### Examples + +``` +polaris catalogs create \ + --storage-type s3 \ + --default-base-location s3://example-bucket/my_data \ + --role-arn ${ROLE_ARN} \ + my_catalog + +polaris catalogs create \ + --storage-type s3 \ + --default-base-location s3://example-bucket/my_other_data \ + --allowed-location s3://example-bucket/second_location \ + --allowed-location s3://other-bucket/third_location \ + --role-arn ${ROLE_ARN} \ + my_other_catalog +``` + +#### delete + +The `delete` subcommand is used to delete a catalog. + +``` +input: polaris catalogs delete --help +options: + delete + Positional arguments: + catalog +``` + +##### Examples + +``` +polaris catalogs delete some_catalog +``` + +#### get + +The `get` subcommand is used to retrieve details about a catalog. + +``` +input: polaris catalogs get --help +options: + get + Positional arguments: + catalog +``` + +##### Examples + +``` +polaris catalogs get some_catalog + +polaris catalogs get another_catalog +``` + +#### list + +The `list` subcommand is used to show details about all catalogs, or those that a certain principal role has access to. The principal used to perform this operation must have the `CATALOG_LIST` privilege. + +``` +input: polaris catalogs list --help +options: + list + Named arguments: + --principal-role The name of a principal role +``` + +##### Examples + +``` +polaris catalogs list + +polaris catalogs list --principal-role some_user +``` + +#### update + +The `update` subcommand is used to update a catalog. Currently, this command supports changing the properties of a catalog or updating its storage configuration. + +``` +input: polaris catalogs update --help +options: + update + Named arguments: + --default-base-location (Required) Default base location of the catalog + --allowed-location An allowed location for files tracked by the catalog. Multiple locations can be provided by specifying this option more than once. + --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once + Positional arguments: + catalog +``` + +##### Examples + +``` +polaris catalogs update --property tag=new_value my_catalog + +polaris catalogs update --default-base-location s3://new-bucket/my_data my_catalog +``` + +### Principals + +The `principals` command is used to manage principals within Polaris. + +`principals` supports the following subcommands: + +1. create +2. delete +3. get +4. list +5. rotate-credentials +6. update + +#### create + +The `create` subcommand is used to create a new principal. + +``` +input: polaris principals create --help +options: + create + Named arguments: + --type The type of principal to create in [SERVICE] + --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once + Positional arguments: + principal +``` + +##### Examples + +``` +polaris principals create some_user + +polaris principals create --client-id ${CLIENT_ID} --property admin=true some_admin_user +``` + +#### delete + +The `delete` subcommand is used to delete a principal. + +``` +input: polaris principals delete --help +options: + delete + Positional arguments: + principal +``` + +##### Examples + +``` +polaris principals delete some_user + +polaris principals delete some_admin_user +``` + +#### get + +The `get` subcommand retrieves details about a principal. + +``` +input: polaris principals get --help +options: + get + Positional arguments: + principal +``` + +##### Examples + +``` +polaris principals get some_user + +polaris principals get some_admin_user +``` + +#### list + +The `list` subcommand shows details about all principals. + +##### Examples + +``` +polaris principals list +``` + +#### rotate-credentials + +The `rotate-credentials` subcommand is used to update the credentials used by a principal. After this command runs successfully, the new credentials will be printed to stdout. + +``` +input: polaris principals rotate-credentials --help +options: + rotate-credentials + Positional arguments: + principal +``` + +##### Examples + +``` +polaris principals rotate-credentials some_user + +polaris principals rotate-credentials some_admin_user +``` + +#### update + +The `update` subcommand is used to update a principal. Currently, this supports rewriting the properties associated with a principal. + +``` +input: polaris principals update --help +options: + update + Named arguments: + --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once + Positional arguments: + principal +``` + +##### Examples + +``` +polaris principals update --property key=value --property other_key=other_value some_user + +polaris principals update --property are_other_keys_removed=yes some_user +``` + +### Principal Roles + +The `principal-roles` command is used to create, discover, and manage principal roles within Polaris. Additionally, this command can identify principals or catalog roles associated with a principal role, and can be used to grant a principal role to a principal. + +`principal-roles` supports the following subcommands: + +1. create +2. delete +3. get +4. list +5. update +6. grant +7. revoke + +#### create + +The `create` subcommand is used to create a new principal role. + +``` +input: polaris principal-roles create --help +options: + create + Named arguments: + --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once + Positional arguments: + principal_role +``` + +##### Examples + +``` +polaris principal-roles create data_engineer + +polaris principal-roles create --property key=value data_analyst +``` + +#### delete + +The `delete` subcommand is used to delete a principal role. + +``` +input: polaris principal-roles delete --help +options: + delete + Positional arguments: + principal_role +``` + +##### Examples + +``` +polaris principal-roles delete data_engineer + +polaris principal-roles delete data_analyst +``` + +#### get + +The `get` subcommand retrieves details about a principal role. + +``` +input: polaris principal-roles get --help +options: + get + Positional arguments: + principal_role +``` + +##### Examples + +``` +polaris principal-roles get data_engineer + +polaris principal-roles get data_analyst +``` + +#### list + +The list subcommand is used to print out all principal roles or, alternatively, to list all principal roles associated with a given principal or with a given catalog role. + +``` +input: polaris principal-roles list --help +options: + list + Named arguments: + --catalog-role The name of a catalog role. If provided, show only principal roles assigned to this catalog role. + --principal The name of a principal. If provided, show only principal roles assigned to this principal. +``` + +##### Examples + +``` +polaris principal-roles list + +polaris principal-roles --principal d.knuth + +polaris principal-roles --catalog-role super_secret_data +``` + +#### update + +The `update` subcommand is used to update a principal role. Currently, this supports updating the properties tied to a principal role. + +``` +input: polaris principal-roles update --help +options: + update + Named arguments: + --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once + Positional arguments: + principal_role +``` + +##### Examples + +``` +polaris principal-roles update --property key=value2 data_engineer + +polaris principal-roles update data_analyst --property key=value3 +``` + +#### grant + +The `grant` subcommand is used to grant a principal role to a principal. + +``` +input: polaris principal-roles grant --help +options: + grant + Named arguments: + --principal A principal to grant this principal role to + Positional arguments: + principal_role +``` + +##### Examples + +``` +polaris principal-roles grant --principal d.knuth data_engineer + +polaris principal-roles grant data_scientist --principal a.ng +``` + +#### revoke + +The `revoke` subcommand is used to revoke a principal role from a principal. + +``` +input: polaris principal-roles revoke --help +options: + revoke + Named arguments: + --principal A principal to revoke this principal role from + Positional arguments: + principal_role +``` + +##### Examples + +``` +polaris principal-roles revoke --principal former.employee data_engineer + +polaris principal-roles revoke data_scientist --principal changed.role +``` + +### Catalog Roles + +The catalog-roles command is used to create, discover, and manage catalog roles within Polaris. Additionally, this command can be used to grant a catalog role to a principal role. + +`catalog-roles` supports the following subcommands: + +1. create +2. delete +3. get +4. list +5. update +6. grant +7. revoke + +#### create + +The `create` subcommand is used to create a new catalog role. + +``` +input: polaris catalog-roles create --help +options: + create + Named arguments: + --catalog The name of an existing catalog + --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once + Positional arguments: + catalog_role +``` + +##### Examples + +``` +polaris catalog-roles create --property key=value --catalog some_catalog sales_data + +polaris catalog-roles create --catalog other_catalog sales_data +``` + +#### delete + +The `delete` subcommand is used to delete a catalog role. + +``` +input: polaris catalog-roles delete --help +options: + delete + Named arguments: + --catalog The name of an existing catalog + Positional arguments: + catalog_role +``` + +##### Examples + +``` +polaris catalog-roles delete --catalog some_catalog sales_data + +polaris catalog-roles delete --catalog other_catalog sales_data +``` + +#### get + +The `get` subcommand retrieves details about a catalog role. + +``` +input: polaris catalog-roles get --help +options: + get + Named arguments: + --catalog The name of an existing catalog + Positional arguments: + catalog_role +``` + +##### Examples + +``` +polaris catalog-roles get --catalog some_catalog inventory_data + +polaris catalog-roles get --catalog other_catalog inventory_data +``` + +#### list + +The `list` subcommand is used to print all catalog roles. Alternatively, if a principal role is provided, only catalog roles associated with that principal are shown. + +``` +input: polaris catalog-roles list --help +options: + list + Named arguments: + --principal-role The name of a principal role + Positional arguments: + catalog +``` + +##### Examples + +``` +polaris catalog-roles list + +polaris catalog-roles list --principal-role data_engineer +``` + +#### update + +The `update` subcommand is used to update a catalog role. Currently, only updating properties associated with the catalog role is supported. + +``` +input: polaris catalog-roles update --help +options: + update + Named arguments: + --catalog The name of an existing catalog + --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once + Positional arguments: + catalog_role +``` + +##### Examples + +``` +polaris catalog-roles update --property contains_pii=true --catalog some_catalog sales_data + +polaris catalog-roles update sales_data --catalog some_catalog --property key=value +``` + +#### grant + +The `grant` subcommand is used to grant a catalog role to a principal role. + +``` +input: polaris catalog-roles grant --help +options: + grant + Named arguments: + --catalog The name of an existing catalog + --principal-role The name of a catalog role + Positional arguments: + catalog_role +``` + +##### Examples + +``` +polaris catalog-roles grant sensitive_data --catalog some_catalog --principal-role power_user + +polaris catalog-roles grant --catalog sales_data contains_cc_info_catalog_role --principal-role financial_analyst_role +``` + +#### revoke + +The `revoke` subcommand is used to revoke a catalog role from a principal role. + +``` +input: polaris catalog-roles revoke --help +options: + revoke + Named arguments: + --catalog The name of an existing catalog + --principal-role The name of a catalog role + Positional arguments: + catalog_role +``` + +##### Examples + +``` +polaris catalog-roles revoke sensitive_data --catalog some_catalog --principal-role power_user + +polaris catalog-roles revoke --catalog sales_data contains_cc_info_catalog_role --principal-role financial_analyst_role +``` + +### Namespaces + +The `namespaces` command is used to manage namespaces within Polaris. + +`namespaces` supports the following subcommands: + +1. create +2. delete +3. get +4. list + +#### create + +The `create` subcommand is used to create a new namespace. + +When creating a namespace with an explicit location, that location must reside within the parent catalog or namespace. + +``` +input: polaris namespaces create --help +options: + create + Named arguments: + --catalog The name of an existing catalog + --location If specified, the location at which to store the namespace and entities inside it + --property A key/value pair such as: tag=value. Multiple can be provided by specifying this option more than once + Positional arguments: + namespace +``` + +##### Examples + +``` +polaris namespaces create --catalog my_catalog outer + +polaris namespaces create --catalog my_catalog --location 's3://bucket/outer/inner_SUFFIX' outer.inner +``` + +#### delete + +The `delete` subcommand is used to delete a namespace. + +``` +input: polaris namespaces delete --help +options: + delete + Named arguments: + --catalog The name of an existing catalog + Positional arguments: + namespace +``` + +##### Examples + +``` +polaris namespaces delete outer_namespace.inner_namespace --catalog my_catalog + +polaris namespaces delete --catalog my_catalog outer_namespace +``` + +#### get + +The `get` subcommand retrieves details about a namespace. + +``` +input: polaris namespaces get --help +options: + get + Named arguments: + --catalog The name of an existing catalog + Positional arguments: + namespace +``` + +##### Examples + +``` +polaris namespaces get --catalog some_catalog a.b + +polaris namespaces get a.b.c --catalog some_catalog +``` + +#### list + +The `list` subcommand shows details about all namespaces directly within a catalog or, optionally, within some parent prefix in that catalog. + +``` +input: polaris namespaces list --help +options: + list + Named arguments: + --catalog The name of an existing catalog + --parent If specified, list namespaces inside this parent namespace +``` + +##### Examples + +``` +polaris namespaces list --catalog my_catalog + +polaris namespaces list --catalog my_catalog --parent a + +polaris namespaces list --catalog my_catalog --parent a.b +``` + +### Privileges + +The `privileges` command is used to grant various privileges to a catalog role, or to revoke those privileges. Privileges can be on the level of a catalog, a namespace, a table, or a view. For more information on privileges, please refer to the [docs]({{% ref "entities#privilege" %}}). + +Note that when using the `privileges` command, the user specifies the relevant catalog and catalog role before selecting a subcommand. + +`privileges` supports the following subcommands: + +1. list +2. catalog +3. namespace +4. table +5. view + +Each of these subcommands, except `list`, supports the `grant` and `revoke` actions and requires an action to be specified. + +Note that each subcommand's `revoke` action always accepts the same options that the corresponding `grant` action does, but with the addition of the `cascade` option. `cascade` is used to revoke all other privileges that depend on the specified privilege. + +#### list + +The `list` subcommand shows details about all privileges for a catalog role. + +``` +input: polaris privileges list --help +options: + list + Named arguments: + --catalog The name of an existing catalog + --catalog-role The name of a catalog role +``` + +##### Examples + +``` +polaris privileges list --catalog my_catalog --catalog-role my_role + +polaris privileges my_role list --catalog-role my_other_role --catalog my_catalog +``` + +#### catalog + +The `catalog` subcommand manages privileges at the catalog level. `grant` is used to grant catalog privileges to the specified catalog role, and `revoke` is used to revoke them. + +``` +input: polaris privileges catalog --help +options: + catalog + grant + Named arguments: + --catalog The name of an existing catalog + --catalog-role The name of a catalog role + Positional arguments: + privilege + revoke + Named arguments: + --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege + --catalog The name of an existing catalog + --catalog-role The name of a catalog role + Positional arguments: + privilege +``` + +##### Examples + +``` +polaris privileges \ + catalog \ + grant \ + --catalog my_catalog \ + --catalog-role catalog_role \ + TABLE_CREATE + +polaris privileges \ + catalog \ + revoke \ + --catalog my_catalog \ + --catalog-role catalog_role \ + --cascade \ + TABLE_CREATE +``` + +#### namespace + +The `namespace` subcommand manages privileges at the namespace level. + +``` +input: polaris privileges namespace --help +options: + namespace + grant + Named arguments: + --namespace A period-delimited namespace + --catalog The name of an existing catalog + --catalog-role The name of a catalog role + Positional arguments: + privilege + revoke + Named arguments: + --namespace A period-delimited namespace + --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege + --catalog The name of an existing catalog + --catalog-role The name of a catalog role + Positional arguments: + privilege +``` + +##### Examples + +``` +polaris privileges \ + namespace \ + grant \ + --catalog my_catalog \ + --catalog-role catalog_role \ + --namespace a.b \ + TABLE_LIST + +polaris privileges \ + namespace \ + revoke \ + --catalog my_catalog \ + --catalog-role catalog_role \ + --namespace a.b \ + TABLE_LIST +``` + +#### table + +The `table` subcommand manages privileges at the table level. + +``` +input: polaris privileges table --help +options: + table + grant + Named arguments: + --namespace A period-delimited namespace + --table The name of a table + --catalog The name of an existing catalog + --catalog-role The name of a catalog role + Positional arguments: + privilege + revoke + Named arguments: + --namespace A period-delimited namespace + --table The name of a table + --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege + --catalog The name of an existing catalog + --catalog-role The name of a catalog role + Positional arguments: + privilege +``` + +##### Examples + +``` +polaris privileges \ + table \ + grant \ + --catalog my_catalog \ + --catalog-role catalog_role \ + --namespace a.b \ + --table t \ + TABLE_DROP + +polaris privileges \ + table \ + grant \ + --catalog my_catalog \ + --catalog-role catalog_role \ + --namespace a.b \ + --table t \ + --cascade \ + TABLE_DROP +``` + +#### view + +The `view` subcommand manages privileges at the view level. + +``` +input: polaris privileges view --help +options: + view + grant + Named arguments: + --namespace A period-delimited namespace + --view The name of a view + --catalog The name of an existing catalog + --catalog-role The name of a catalog role + Positional arguments: + privilege + revoke + Named arguments: + --namespace A period-delimited namespace + --view The name of a view + --cascade When revoking privileges, additionally revoke privileges that depend on the specified privilege + --catalog The name of an existing catalog + --catalog-role The name of a catalog role + Positional arguments: + privilege +``` + +##### Examples + +``` +polaris privileges \ + view \ + grant \ + --catalog my_catalog \ + --catalog-role catalog_role \ + --namespace a.b.c \ + --view v \ + VIEW_FULL_METADATA + +polaris privileges \ + view \ + grant \ + --catalog my_catalog \ + --catalog-role catalog_role \ + --namespace a.b.c \ + --view v \ + --cascade \ + VIEW_FULL_METADATA +``` + +## Examples + +This section outlines example code for a few common operations as well as for some more complex ones. + +For especially complex operations, you may wish to instead directly use the Python API. + +### Creating a principal and a catalog + +``` +polaris principals create my_user + +polaris catalogs create \ + --type internal \ + --storage-type s3 \ + --default-base-location s3://iceberg-bucket/polaris-base \ + --role-arn arn:aws:iam::111122223333:role/ExampleCorpRole \ + --allowed-location s3://iceberg-bucket/polaris-alt-location-1 \ + --allowed-location s3://iceberg-bucket/polaris-alt-location-2 \ + my_catalog +``` + +### Granting a principal the ability to manage the content of a catalog + +``` +polaris principal-roles create power_user +polaris principal-roles grant --principal my_user power_user + +polaris catalog-roles create --catalog my_catalog my_catalog_role +polaris catalog-roles grant \ + --catalog my_catalog \ + --principal-role power_user \ + my_catalog_role + +polaris privileges \ + catalog \ + --catalog my_catalog \ + --catalog-role my_catalog_role \ + grant \ + CATALOG_MANAGE_CONTENT +``` + +### Identifying the tables a given principal has been granted explicit access to read + +_Note that some other privileges, such as `CATALOG_MANAGE_CONTENT`, subsume `TABLE_READ_DATA` and would not be discovered here._ + +``` +principal_roles=$(polaris principal-roles list --principal my_principal) +for principal_role in ${principal_roles}; do + catalog_roles=$(polaris catalog-roles --list --principal-role "${principal_role}") + for catalog_role in ${catalog_roles}; do + grants=$(polaris privileges list --catalog-role "${catalog_role}" --catalog "${catalog}") + for grant in $(echo "${grants}" | jq -c '.[] | select(.privilege == "TABLE_READ_DATA")'); do + echo "${grant}" + fi + done + done +done +``` + + diff --git a/site/content/in-dev/0.9.0/configuring-polaris-for-production.md b/site/content/in-dev/0.9.0/configuring-polaris-for-production.md new file mode 100644 index 00000000..152d12fd --- /dev/null +++ b/site/content/in-dev/0.9.0/configuring-polaris-for-production.md @@ -0,0 +1,131 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Configuring Apache Polaris (Incubating) for Production +linkTitle: Deploying In Production +type: docs +weight: 600 +--- + +The default `polaris-server.yml` configuration is intended for development and testing. When deploying Polaris in production, there are several best practices to keep in mind. + +## Security + +### Configurations + +Notable configuration used to secure a Polaris deployment are outlined below. + +#### oauth2 + +> [!WARNING] +> Ensure that the `tokenBroker` setting reflects the token broker specified in `authenticator` below. + +* Configure [OAuth](https://oauth.net/2/) with this setting. Remove the `TestInlineBearerTokenPolarisAuthenticator` option and uncomment the `DefaultPolarisAuthenticator` authenticator option beneath it. +* Then, configure the token broker. You can configure the token broker to use either [asymmetric](https://github.com/apache/polaris/blob/b482617bf8cc508b37dbedf3ebc81a9408160a5e/polaris-service/src/main/java/io/polaris/service/auth/JWTRSAKeyPair.java#L24) or [symmetric](https://github.com/apache/polaris/blob/b482617bf8cc508b37dbedf3ebc81a9408160a5e/polaris-service/src/main/java/io/polaris/service/auth/JWTSymmetricKeyBroker.java#L23) keys. + +#### authenticator.tokenBroker + +> [!WARNING] +> Ensure that the `tokenBroker` setting reflects the token broker specified in `oauth2` above. + +#### callContextResolver & realmContextResolver +* Use these configurations to specify a service that can resolve a realm from bearer tokens. +* The service(s) used here must implement the relevant interfaces (i.e. [CallContextResolver](https://github.com/apache/polaris/blob/8290019c10290a600e40b35ddb1e2f54bf99e120/polaris-service/src/main/java/io/polaris/service/context/CallContextResolver.java#L27) and [RealmContextResolver](https://github.com/apache/polaris/blob/7ce86f10a68a3b56aed766235c88d6027c0de038/polaris-service/src/main/java/io/polaris/service/context/RealmContextResolver.java)). + +## Metastore Management + +> [!IMPORTANT] +> The default `in-memory` implementation for `metastoreManager` is meant for testing and not suitable for production usage. Instead, consider an implementation such as `eclipse-link` which allows you to store metadata in a remote database. + +A Metastore Manger should be configured with an implementation that durably persists Polaris entities. Use the configuration `metaStoreManager` to configure a [MetastoreManager](https://github.com/apache/polaris/blob/627dc602eb15a3258dcc32babf8def34cf6de0e9/polaris-core/src/main/java/io/polaris/core/persistence/PolarisMetaStoreManager.java#L47) implementation where Polaris entities will be persisted. + +Be sure to secure your metastore backend since it will be storing credentials and catalog metadata. + +### Configuring EclipseLink + +To use EclipseLink for metastore management, specify the configuration `metaStoreManager.conf-file` to point to an EclipseLink `persistence.xml` file. This file, local to the Polaris service, contains details of the database used for metastore management and the connection settings. For more information, refer to the [metastore documentation]({{% ref "metastores" %}}). + +> [!IMPORTANT] +> EclipseLink requires +> 1. Building the JAR for the EclipseLink extension +> 2. Setting the `eclipseLink` gradle property to `true`. +> +> This can be achieved by setting `eclipseLink=true` in the `gradle.properties` file, or by passing the property explicitly while building all JARs, e.g.: `./gradlew -PeclipseLink=true clean assemble` + +### Bootstrapping + +Before using Polaris when using a metastore manager other than `in-memory`, you must **bootstrap** the metastore manager. This is a manual operation that must be performed **only once** in order to prepare the metastore manager to integrate with Polaris. When the metastore manager is bootstrapped, any existing Polaris entities in the metastore manager may be **purged**. + +By default, Polaris will create randomised `CLIENT_ID` and `CLIENT_SECRET` for the `root` principal and store their hashes in the metastore backend. In order to provide your own credentials for `root` principal (so you can request tokens via `api/catalog/v1/oauth/tokens`), set the following envrionment variables for realm name `my_realm`: + +``` +export POLARIS_BOOTSTRAP_MY_REALM_ROOT_CLIENT_ID=my-client-id +export POLARIS_BOOTSTRAP_MY_REALM_ROOT_CLIENT_SECRET=my-client-secret +``` + +**IMPORTANT**: In case you use `default-realm` for metastore backend database, you won't be able to use `export` command. Use this instead: + +```bash +env POLARIS_BOOTSTRAP_DEFAULT-REALM_ROOT_CLIENT_ID=my-client-id POLARIS_BOOTSTRAP_DEFAULT-REALM_ROOT_CLIENT_SECRET=my-client-secret <bootstrap command> +``` + +Now, to bootstrap Polaris, run: + +```bash +java -jar /path/to/jar/polaris-service-all.jar bootstrap polaris-server.yml +``` + +or in a container: + +```bash +bin/polaris-service bootstrap config/polaris-server.yml +``` + +Afterward, Polaris can be launched normally: + +```bash +java -jar /path/to/jar/polaris-service-all.jar server polaris-server.yml +``` + +You can verify the setup by attempting a token issue for the `root` principal: + +```bash +curl -X POST http://localhost:8181/api/catalog/v1/oauth/tokens -d "grant_type=client_credentials&client_id=my-client-id&client_secret=my-client-secret&scope=PRINCIPAL_ROLE:ALL" +``` + +which should return: + +```json +{"access_token":"...","token_type":"bearer","issued_token_type":"urn:ietf:params:oauth:token-type:access_token","expires_in":3600} +``` + +Note that if you used non-default realm name, for example, `iceberg` instead of `default-realm` in your `polaris-server.yml`, then you should add an appropriate request header: +```bash +curl -X POST -H 'realm: iceberg' http://localhost:8181/api/catalog/v1/oauth/tokens -d "grant_type=client_credentials&client_id=my-client-id&client_secret=my-client-secret&scope=PRINCIPAL_ROLE:ALL" +``` + +## Other Configurations + +When deploying Polaris in production, consider adjusting the following configurations: + +#### featureConfiguration.SUPPORTED_CATALOG_STORAGE_TYPES + - By default Polaris catalogs are allowed to be located in local filesystem with the `FILE` storage type. This should be disabled for production systems. + - Use this configuration to additionally disable any other storage types that will not be in use. + + diff --git a/site/content/in-dev/0.9.0/entities.md b/site/content/in-dev/0.9.0/entities.md new file mode 100644 index 00000000..0e02c6a8 --- /dev/null +++ b/site/content/in-dev/0.9.0/entities.md @@ -0,0 +1,89 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +Title: Entities +type: docs +weight: 400 +--- + +This page documents various entities that can be managed in Apache Polaris (Incubating). + +## Catalog + +A catalog is a top-level entity in Polaris that may contain other entities like [namespaces](#namespace) and [tables](#table). These map directly to [Apache Iceberg catalogs](https://iceberg.apache.org/concepts/catalog/). + +For information on managing catalogs with the REST API or for more information on what data can be associated with a catalog, see [the API docs]({{% github-polaris "regtests/client/python/docs/CreateCatalogRequest.md" %}}). + +### Storage Type + +All catalogs in Polaris are associated with a _storage type_. Valid Storage Types are `S3`, `Azure`, and `GCS`. The `FILE` type is also additionally available for testing. Each of these types relates to a different storage provider where data within the catalog may reside. Depending on the storage type, various other configurations may be set for a catalog including credentials to be used when accessing data inside the catalog. + +For details on how to use Storage Types in the REST API, see [the API docs]({{% github-polaris "regtests/client/python/docs/StorageConfigInfo.md" %}}). + +## Namespace + +A namespace is a logical entity that resides within a [catalog](#catalog) and can contain other entities such as [tables](#table) or [views](#view). Some other systems may refer to namespaces as _schemas_ or _databases_. + +In Polaris, namespaces can be nested. For example, `a.b.c.d.e.f.g` is a valid namespace. `b` is said to reside within `a`, and so on. + +For information on managing namespaces with the REST API or for more information on what data can be associated with a namespace, see [the API docs]({{% github-polaris "regtests/client/python/docs/CreateNamespaceRequest.md" %}}). + + +## Table + +Polaris tables are entities that map to [Apache Iceberg tables](https://iceberg.apache.org/docs/nightly/configuration/). + +For information on managing tables with the REST API or for more information on what data can be associated with a table, see [the API docs]({{% github-polaris "regtests/client/python/docs/CreateTableRequest.md" %}}). + +## View + +Polaris views are entities that map to [Apache Iceberg views](https://iceberg.apache.org/view-spec/). + +For information on managing views with the REST API or for more information on what data can be associated with a view, see [the API docs]({{% github-polaris "regtests/client/python/docs/CreateViewRequest.md" %}}). + +## Principal + +Polaris principals are unique identities that can be used to represent users or services. Each principal may have one or more [principal roles](#principal-role) assigned to it for the purpose of accessing catalogs and the entities within them. + +For information on managing principals with the REST API or for more information on what data can be associated with a principal, see [the API docs]({{% github-polaris "regtests/client/python/docs/CreatePrincipalRequest.md" %}}). + +## Principal Role + +Polaris principal roles are labels that may be granted to [principals](#principal). Each principal may have one or more principal roles, and the same principal role may be granted to multiple principals. Principal roles may be assigned based on the persona or responsibilities of a given principal, or on how that principal will need to access different entities within Polaris. + +For information on managing principal roles with the REST API or for more information on what data can be associated with a principal role, see [the API docs]({{% github-polaris "regtests/client/python/docs/CreatePrincipalRoleRequest.md" %}}). + + +## Catalog Role + +Polaris catalog roles are labels that may be granted to [catalogs](#catalog). Each catalog may have one or more catalog roles, and the same catalog role may be granted to multiple catalogs. Catalog roles may be assigned based on the nature of data that will reside in a catalog, or by the groups of users and services that might need to access that data. + +Each catalog role may have multiple [privileges](#privilege) granted to it, and each catalog role can be granted to one or more [principal roles](#principal-role). This is the mechanism by which principals are granted access to entities inside a catalog such as namespaces and tables. + +## Privilege + +Polaris privileges are granted to [catalog roles](#catalog-role) in order to grant principals with a given principal role some degree of access to catalogs with a given catalog role. When a privilege is granted to a catalog role, any principal roles granted that catalog role receive the privilege. In turn, any principals who are granted that principal role receive it. + +A privilege can be scoped to any entity inside a catalog, including the catalog itself. + +For a list of supported privileges for each privilege class, see the API docs: +* [Table Privileges]({{% github-polaris "regtests/client/python/docs/TablePrivilege.md" %}}) +* [View Privileges]({{% github-polaris "regtests/client/python/docs/ViewPrivilege.md" %}}) +* [Namespace Privileges]({{% github-polaris "regtests/client/python/docs/NamespacePrivilege.md" %}}) +* [Catalog Privileges]({{% github-polaris "regtests/client/python/docs/CatalogPrivilege.md" %}}) diff --git a/site/content/in-dev/0.9.0/metastores.md b/site/content/in-dev/0.9.0/metastores.md new file mode 100644 index 00000000..74766c9c --- /dev/null +++ b/site/content/in-dev/0.9.0/metastores.md @@ -0,0 +1,112 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: Metastores +linkTitle: Metastores +type: docs +weight: 700 +--- + +This page documents important configurations for connecting to production database through [EclipseLink](https://eclipse.dev/eclipselink/). + +## Polaris Server Configuration +Configure the `metaStoreManager` section in the Polaris configuration (`polaris-server.yml` by default) as follows: +``` +metaStoreManager: + type: eclipse-link + conf-file: META-INF/persistence.xml + persistence-unit: polaris +``` + +`conf-file` must point to an [EclipseLink configuration file](https://eclipse.dev/eclipselink/documentation/2.5/solutions/testingjpa002.htm) + +By default, `conf-file` points to the embedded resource file `META-INF/persistence.xml` in the `polaris-eclipselink` module. + +In order to specify a configuration file outside the classpath, follow these steps. +1) Place `persistence.xml` into a jar file: `jar cvf /tmp/conf.jar persistence.xml` +2) Use `conf-file: /tmp/conf.jar!/persistence.xml` + +## EclipseLink Configuration - persistence.xml +The configuration file `persistence.xml` is used to set up the database connection properties, which can differ depending on the type of database and its configuration. + +Check out the default [persistence.xml](https://github.com/apache/polaris/blob/main/extension/persistence/eclipselink/src/main/resources/META-INF/persistence.xml) for a complete sample for connecting to the file-based H2 database. + +Polaris creates and connects to a separate database for each realm. Specifically, the `{realm}` placeholder in `jakarta.persistence.jdbc.url` is substituted with the actual realm name, allowing the Polaris server to connect to different databases based on the realm. + +> Note: some database systems such as Postgres don't create databases automatically. Database admins need to create them manually before running Polaris server. +```xml +<persistence-unit name="polaris" transaction-type="RESOURCE_LOCAL"> + <provider>org.eclipse.persistence.jpa.PersistenceProvider</provider> + <class>org.apache.polaris.jpa.models.ModelEntity</class> + <class>org.apache.polaris.jpa.models.ModelEntityActive</class> + <class>org.apache.polaris.jpa.models.ModelEntityChangeTracking</class> + <class>org.apache.polaris.jpa.models.ModelEntityDropped</class> + <class>org.apache.polaris.jpa.models.ModelGrantRecord</class> + <class>org.apache.polaris.jpa.models.ModelPrincipalSecrets</class> + <class>org.apache.polaris.jpa.models.ModelSequenceId</class> + <shared-cache-mode>NONE</shared-cache-mode> + <properties> + <property name="jakarta.persistence.jdbc.url" + value="jdbc:h2:file:tmp/polaris_test/filedb_{realm}"/> + <property name="jakarta.persistence.jdbc.user" value="sa"/> + <property name="jakarta.persistence.jdbc.password" value=""/> + <property name="jakarta.persistence.schema-generation.database.action" value="create"/> + </properties> +</persistence-unit> +``` + +A single `persistence.xml` can describe multiple [persistence units](https://eclipse.dev/eclipselink/documentation/2.6/concepts/app_dev001.htm). For example, with both a `polaris-dev` and `polaris` persistence unit defined, you could use a single `persistence.xml` to easily switch between development and production databases. Use `persistence-unit` in the Polaris server configuration to easily switch between persistence units. + +To build Polaris with the necessary H2 dependency and start the Polaris service, run the following: +```bash +polaris> ./gradlew --no-daemon --info -PeclipseLink=true -PeclipseLinkDeps=com.h2database:h2:2.3.232 clean shadowJar +polaris> java -jar dropwizard/service/build/libs/polaris-dropwizard-service-*.jar server ./polaris-server.yml +``` + +### Postgres + +The following shows a sample configuration for integrating Polaris with Postgres. + +```xml +<persistence-unit name="polaris" transaction-type="RESOURCE_LOCAL"> + <provider>org.eclipse.persistence.jpa.PersistenceProvider</provider> + <class>org.apache.polaris.jpa.models.ModelEntity</class> + <class>org.apache.polaris.jpa.models.ModelEntityActive</class> + <class>org.apache.polaris.jpa.models.ModelEntityChangeTracking</class> + <class>org.apache.polaris.jpa.models.ModelEntityDropped</class> + <class>org.apache.polaris.jpa.models.ModelGrantRecord</class> + <class>org.apache.polaris.jpa.models.ModelPrincipalSecrets</class> + <class>org.apache.polaris.jpa.models.ModelSequenceId</class> + <shared-cache-mode>NONE</shared-cache-mode> + <properties> + <property name="jakarta.persistence.jdbc.url" + value="jdbc:postgresql://localhost:5432/{realm}"/> + <property name="jakarta.persistence.jdbc.user" value="postgres"/> + <property name="jakarta.persistence.jdbc.password" value="postgres"/> + <property name="jakarta.persistence.schema-generation.database.action" value="create"/> + <property name="eclipselink.persistence-context.flush-mode" value="auto"/> + </properties> +</persistence-unit> +``` + +To build Polaris with the necessary Postgres dependency and start the Polaris service, run the following: +```bash +polaris> ./gradlew --no-daemon --info -PeclipseLink=true -PeclipseLinkDeps=org.postgresql:postgresql:42.7.4 clean shadowJar +polaris> java -jar dropwizard/service/build/libs/polaris-dropwizard-service-*.jar server ./polaris-server.yml +``` \ No newline at end of file diff --git a/site/content/in-dev/0.9.0/overview.md b/site/content/in-dev/0.9.0/overview.md new file mode 100644 index 00000000..41f8daea --- /dev/null +++ b/site/content/in-dev/0.9.0/overview.md @@ -0,0 +1,215 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +Title: Overview +type: docs +weight: 200 +--- + +Apache Polaris (Incubating) is a catalog implementation for Apache Iceberg™ tables and is built on the open source Apache Iceberg™ REST protocol. + +With Polaris, you can provide centralized, secure read and write access to your Iceberg tables across different REST-compatible query engines. + + overview") + +## Key concepts + +This section introduces key concepts associated with using Apache Polaris (Incubating). + +In the following diagram, a sample [Apache Polaris (Incubating) structure](#catalog) with nested [namespaces](#namespace) is shown for Catalog1. No tables +or namespaces have been created yet for Catalog2 or Catalog3. + + structure") + +### Catalog + +In Polaris, you can create one or more catalog resources to organize Iceberg tables. + +Configure your catalog by setting values in the storage configuration for S3, Azure, or Google Cloud Storage. An Iceberg catalog enables a +query engine to manage and organize tables. The catalog forms the first architectural layer in the [Apache Iceberg™ table specification](https://iceberg.apache.org/spec/#overview) and must support the following tasks: + +- Storing the current metadata pointer for one or more Iceberg tables. A metadata pointer maps a table name to the location of that table's + current metadata file. + +- Performing atomic operations so that you can update the current metadata pointer for a table to the metadata pointer of a new version of + the table. + +To learn more about Iceberg catalogs, see the [Apache Iceberg™ documentation](https://iceberg.apache.org/concepts/catalog/). + +#### Catalog types + +A catalog can be one of the following two types: + +- Internal: The catalog is managed by Polaris. Tables from this catalog can be read and written in Polaris. + +- External: The catalog is externally managed by another Iceberg catalog provider (for example, Snowflake, Glue, Dremio Arctic). Tables from + this catalog are synced to Polaris. These tables are read-only in Polaris. In the current release, only a Snowflake external catalog is provided. + +A catalog is configured with a storage configuration that can point to S3, Azure storage, or GCS. + +### Namespace + +You create *namespaces* to logically group Iceberg tables within a catalog. A catalog can have multiple namespaces. You can also create +nested namespaces. Iceberg tables belong to namespaces. + +### Apache Iceberg™ tables and catalogs + +In an internal catalog, an Iceberg table is registered in Polaris, but read and written via query engines. The table data and +metadata is stored in your external cloud storage. The table uses Polaris as the Iceberg catalog. + +If you have tables housed in another Iceberg catalog, you can sync these tables to an external catalog in Polaris. +If you sync this catalog to Polaris, it appears as an external catalog in Polaris. Clients connecting to the external +catalog can read from or write to these tables. However, clients connecting to Polaris will only be able to +read from these tables. + +> **Important** +> +> For the access privileges defined for a catalog to be enforced correctly, the following conditions must be met: +> +> - The directory only contains the data files that belong to a single table. +> - The directory hierarchy matches the namespace hierarchy for the catalog. +> +> For example, if a catalog includes the following items: +> +> - Top-level namespace namespace1 +> - Nested namespace namespace1a +> - A customers table, which is grouped under nested namespace namespace1a +> - An orders table, which is grouped under nested namespace namespace1a +> +> The directory hierarchy for the catalog must follow this structure: +> +> - /namespace1/namespace1a/customers/<files for the customers table *only*> +> - /namespace1/namespace1a/orders/<files for the orders table *only*> + +### Service principal + +A service principal is an entity that you create in Polaris. Each service principal encapsulates credentials that you use to connect +to Polaris. + +Query engines use service principals to connect to catalogs. + +Polaris generates a Client ID and Client Secret pair for each service principal. + +The following table displays example service principals that you might create in Polaris: + + | Service connection name | Purpose | + | --------------------------- | ----------- | + | Flink ingestion | For Apache Flink® to ingest streaming data into Apache Iceberg™ tables. | + | Spark ETL pipeline | For Apache Spark™ to run ETL pipeline jobs on Iceberg tables. | + | Snowflake data pipelines | For Snowflake to run data pipelines for transforming data in Apache Iceberg™ tables. | + | Trino BI dashboard | For Trino to run BI queries for powering a dashboard. | + | Snowflake AI team | For Snowflake to run AI jobs on data in Apache Iceberg™ tables. | + +### Service connection + +A service connection represents a REST-compatible engine (such as Apache Spark™, Apache Flink®, or Trino) that can read from and write to Polaris +Catalog. When creating a new service connection, the Polaris administrator grants the service principal that is created with the new service +connection either a new or existing principal role. A principal role is a resource in Polaris that you can use to logically group Polaris +service principals together and grant privileges on securable objects. For more information, see [Principal role]({{% ref "access-control#principal-role" %}}). Polaris uses a role-based access control (RBAC) model to grant service principals access to resources. For more information, +see [Access control]({{% ref "access-control" %}}). For a diagram of this model, see [RBAC model]({{% ref "access-control#rbac-model" %}}). + +If the Polaris administrator grants the service principal for the new service connection a new principal role, the service principal +doesn't have any privileges granted to it yet. When securing the catalog that the new service connection will connect to, the Polaris +administrator grants privileges to catalog roles and then grants these catalog roles to the new principal role. As a result, the service +principal for the new service connection has these privileges. For more information about catalog roles, see [Catalog role]({{% ref "access-control#catalog-role" %}}). + +If the Polaris administrator grants an existing principal role to the service principal for the new service connection, the service principal +has the same privileges granted to the catalog roles that are granted to the existing principal role. If needed, the Polaris +administrator can grant additional catalog roles to the existing principal role or remove catalog roles from it to adjust the privileges +bestowed to the service principal. For an example of how RBAC works in Polaris, see [RBAC example]({{% ref "access-control#rbac-example" %}}). + +### Storage configuration + +A storage configuration stores a generated identity and access management (IAM) entity for your external cloud storage and is created +when you create a catalog. The storage configuration is used to set the values to connect Polaris to your cloud storage. During the +catalog creation process, an IAM entity is generated and used to create a trust relationship between the cloud storage provider and Polaris +Catalog. + +When you create a catalog, you supply the following information about your external cloud storage: + +| Cloud storage provider | Information | +| -----------------------| ----------- | +| Amazon S3 | <ul><li>Default base location for your Amazon S3 bucket</li><li>Locations for your Amazon S3 bucket</li><li>S3 role ARN</li><li>External ID (optional)</li></ul> | +| Google Cloud Storage (GCS) | <ul><li>Default base location for your GCS bucket</li><li>Locations for your GCS bucket</li></ul> | +| Azure | <ul><li>Default base location for your Microsoft Azure container</li><li>Locations for your Microsoft Azure container</li><li>Azure tenant ID</li></ul> | + +## Example workflow + +In the following example workflow, Bob creates an Apache Iceberg™ table named Table1 and Alice reads data from Table1. + +1. Bob uses Apache Spark™ to create the Table1 table under the + Namespace1 namespace in the Catalog1 catalog and insert values into + Table1. + + Bob can create Table1 and insert data into it because he is using a + service connection with a service principal that has + the privileges to perform these actions. + +2. Alice uses Snowflake to read data from Table1. + + Alice can read data from Table1 because she is using a service + connection with a service principal with a catalog integration that + has the privileges to perform this action. Alice + creates an unmanaged table in Snowflake to read data from Table1. + +") + +## Security and access control + +This section describes security and access control. + +### Credential vending + +To secure interactions with service connections, Polaris vends temporary storage credentials to the query engine during query +execution. These credentials allow the query engine to run the query without requiring access to your external cloud storage for +Iceberg tables. This process is called credential vending. + +As of now, the following limitation is known regarding Apache Iceberg support: + +- **remove_orphan_files:** Apache Spark can't use credential vending + for this due to a known issue. See [apache/iceberg#7914](https://github.com/apache/iceberg/pull/7914) for details. + +### Identity and access management (IAM) + +Polaris uses the identity and access management (IAM) entity to securely connect to your storage for accessing table data, Iceberg +metadata, and manifest files that store the table schema, partitions, and other metadata. Polaris retains the IAM entity for your +storage location. + +### Access control + +Polaris enforces the access control that you configure across all tables registered with the service and governs security for all +queries from query engines in a consistent manner. + +Polaris uses a role-based access control (RBAC) model that lets you centrally configure access for Polaris service principals to catalogs, +namespaces, and tables. + +Polaris RBAC uses two different role types to delegate privileges: + +- **Principal roles:** Granted to Polaris service principals and + analogous to roles in other access control systems that you grant to + service principals. + +- **Catalog roles:** Configured with certain privileges on Polaris + catalog resources and granted to principal roles. + +For more information, see [Access control]({{% ref "access-control" %}}). + +## Legal Notices + +Apache®, Apache Iceberg™, Apache Spark™, Apache Flink®, and Flink® are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. diff --git a/site/content/in-dev/0.9.0/polaris-management-service.md b/site/content/in-dev/0.9.0/polaris-management-service.md new file mode 100644 index 00000000..0b66b9da --- /dev/null +++ b/site/content/in-dev/0.9.0/polaris-management-service.md @@ -0,0 +1,27 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: 'Apache Polaris Management Service OpenAPI' +linkTitle: 'Management OpenAPI' +weight: 800 +params: + show_page_toc: false +--- + +{{< redoc-polaris "polaris-management-service.yml" >}} diff --git a/site/content/in-dev/0.9.0/quickstart.md b/site/content/in-dev/0.9.0/quickstart.md new file mode 100644 index 00000000..57f8e767 --- /dev/null +++ b/site/content/in-dev/0.9.0/quickstart.md @@ -0,0 +1,332 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +Title: Quick Start +type: docs +weight: 100 +--- + +This guide serves as a introduction to several key entities that can be managed with Apache Polaris (Incubating), describes how to build and deploy Polaris locally, and finally includes examples of how to use Polaris with Apache Spark™. + +## Prerequisites + +This guide covers building Polaris, deploying it locally or via [Docker](https://www.docker.com/), and interacting with it using the command-line interface and [Apache Spark](https://spark.apache.org/). Before proceeding with Polaris, be sure to satisfy the relevant prerequisites listed here. + +### Building and Deploying Polaris + +To get the latest Polaris code, you'll need to clone the repository using [git](https://git-scm.com/). You can install git using [homebrew](https://brew.sh/): + +```shell +brew install git +``` + +Then, use git to clone the Polaris repo: + +```shell +cd ~ +git clone https://github.com/apache/polaris.git +``` + +#### With Docker + +If you plan to deploy Polaris inside [Docker](https://www.docker.com/), you'll need to install docker itself. For example, this can be done using [homebrew](https://brew.sh/): + +```shell +brew install --cask docker +``` + +Once installed, make sure Docker is running. + +#### From Source + +If you plan to build Polaris from source yourself, you will need to satisfy a few prerequisites first. + +Polaris is built using [gradle](https://gradle.org/) and is compatible with Java 21. We recommend the use of [jenv](https://www.jenv.be/) to manage multiple Java versions. For example, to install Java 21 via [homebrew](https://brew.sh/) and configure it with jenv: + +```shell +cd ~/polaris +brew install openjdk@21 jenv +jenv add $(brew --prefix openjdk@21) +jenv local 21 +``` + +### Connecting to Polaris + +Polaris is compatible with any [Apache Iceberg](https://iceberg.apache.org/) client that supports the REST API. Depending on the client you plan to use, refer to the prerequisites below. + +#### With Spark + +If you want to connect to Polaris with [Apache Spark](https://spark.apache.org/), you'll need to start by cloning Spark. As [above](#building-and-deploying-polaris), make sure [git](https://git-scm.com/) is installed first. You can install it with [homebrew](https://brew.sh/): + +```shell +brew install git +``` + +Then, clone Spark and check out a versioned branch. This guide uses [Spark 3.5](https://spark.apache.org/releases/spark-release-3-5-0.html). + +```shell +cd ~ +git clone https://github.com/apache/spark.git +cd ~/spark +git checkout branch-3.5 +``` + +## Deploying Polaris + +Polaris can be deployed via a lightweight docker image or as a standalone process. Before starting, be sure that you've satisfied the relevant [prerequisites](#building-and-deploying-polaris) detailed above. + +### Docker Image + +To start using Polaris in Docker, launch Polaris while Docker is running: + +```shell +cd ~/polaris +docker compose -f docker-compose.yml up --build +``` + +Once the `polaris-polaris` container is up, you can continue to [Defining a Catalog](#defining-a-catalog). + +### Building Polaris + +Run Polaris locally with: + +```shell +cd ~/polaris +./gradlew runApp +``` + +You should see output for some time as Polaris builds and starts up. Eventually, you won’t see any more logs and should see messages that resemble the following: + +``` +INFO [...] [main] [] o.e.j.s.handler.ContextHandler: Started i.d.j.MutableServletContextHandler@... +INFO [...] [main] [] o.e.j.server.AbstractConnector: Started application@... +INFO [...] [main] [] o.e.j.server.AbstractConnector: Started admin@... +INFO [...] [main] [] o.eclipse.jetty.server.Server: Started Server@... +``` + +At this point, Polaris is running. + +## Bootstrapping Polaris + +For this tutorial, we'll launch an instance of Polaris that stores entities only in-memory. This means that any entities that you define will be destroyed when Polaris is shut down. It also means that Polaris will automatically bootstrap itself with root credentials. For more information on how to configure Polaris for production usage, see the [docs]({{% ref "configuring-polaris-for-production" %}}). + +When Polaris is launched using in-memory mode the root principal credentials can be found in stdout on initial startup. For example: + +``` +realm: default-realm root principal credentials: <client-id>:<client-secret> +``` + +Be sure to note of these credentials as we'll be using them below. You can also set these credentials as environment variables for use with the Polaris CLI: + +```shell +export CLIENT_ID=<client-id> +export CLIENT_SECRET=<client-secret> +``` + +## Defining a Catalog + +In Polaris, the [catalog]({{% ref "entities#catalog" %}}) is the top-level entity that objects like [tables]({{% ref "entities#table" %}}) and [views]({{% ref "entities#view" %}}) are organized under. With a Polaris service running, you can create a catalog like so: + +```shell +cd ~/polaris + +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + catalogs \ + create \ + --storage-type s3 \ + --default-base-location ${DEFAULT_BASE_LOCATION} \ + --role-arn ${ROLE_ARN} \ + quickstart_catalog +``` + +This will create a new catalog called **quickstart_catalog**. + +The `DEFAULT_BASE_LOCATION` you provide will be the default location that objects in this catalog should be stored in, and the `ROLE_ARN` you provide should be a [Role ARN](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html) with access to read and write data in that location. These credentials will be provided to engines reading data from the catalog once they have authenticated with Polaris using credentials that have access to those resources. + +If you’re using a storage type other than S3, such as Azure, you’ll provide a different type of credential than a Role ARN. For more details on supported storage types, see the [docs]({{% ref "entities#storage-type" %}}). + +Additionally, if Polaris is running somewhere other than `localhost:8181`, you can specify the correct hostname and port by providing `--host` and `--port` flags. For the full set of options supported by the CLI, please refer to the [docs]({{% ref "command-line-interface" %}}). + + +### Creating a Principal and Assigning it Privileges + +With a catalog created, we can create a [principal]({{% ref "entities#principal" %}}) that has access to manage that catalog. For details on how to configure the Polaris CLI, see [the section above](#defining-a-catalog) or refer to the [docs]({{% ref "command-line-interface" %}}). + +```shell +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + principals \ + create \ + quickstart_user + +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + principal-roles \ + create \ + quickstart_user_role + +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + catalog-roles \ + create \ + --catalog quickstart_catalog \ + quickstart_catalog_role +``` + +Be sure to provide the necessary credentials, hostname, and port as before. + +When the `principals create` command completes successfully, it will return the credentials for this new principal. Be sure to note these down for later. For example: + +``` +./polaris ... principals create example +{"clientId": "XXXX", "clientSecret": "YYYY"} +``` + +Now, we grant the principal the [principal role]({{% ref "entities#principal-role" %}}) we created, and grant the [catalog role]({{% ref "entities#catalog-role" %}}) the principal role we created. For more information on these entities, please refer to the linked documentation. + +```shell +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + principal-roles \ + grant \ + --principal quickstart_user \ + quickstart_user_role + +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + catalog-roles \ + grant \ + --catalog quickstart_catalog \ + --principal-role quickstart_user_role \ + quickstart_catalog_role +``` + +Now, we’ve linked our principal to the catalog via roles like so: + + + +In order to give this principal the ability to interact with the catalog, we must assign some [privileges]({{% ref "entities#privilege" %}}). For the time being, we will give this principal the ability to fully manage content in our new catalog. We can do this with the CLI like so: + +```shell +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + privileges \ + catalog \ + grant \ + --catalog quickstart_catalog \ + --catalog-role quickstart_catalog_role \ + CATALOG_MANAGE_CONTENT +``` + +This grants the [catalog privileges]({{% ref "entities#privilege" %}}) `CATALOG_MANAGE_CONTENT` to our catalog role, linking everything together like so: + + + +`CATALOG_MANAGE_CONTENT` has create/list/read/write privileges on all entities within the catalog. The same privilege could be granted to a namespace, in which case the principal could create/list/read/write any entity under that namespace. + +## Using Iceberg & Polaris + +At this point, we’ve created a principal and granted it the ability to manage a catalog. We can now use an external engine to assume that principal, access our catalog, and store data in that catalog using [Apache Iceberg](https://iceberg.apache.org/). + +### Connecting with Spark + +To use a Polaris-managed catalog in [Apache Spark](https://spark.apache.org/), we can configure Spark to use the Iceberg catalog REST API. + +This guide uses [Apache Spark 3.5](https://spark.apache.org/releases/spark-release-3-5-0.html), but be sure to find [the appropriate iceberg-spark package for your Spark version](https://mvnrepository.com/artifact/org.apache.iceberg/iceberg-spark). From a local Spark clone on the `branch-3.5` branch we can run the following: + +_Note: the credentials provided here are those for our principal, not the root credentials._ + +```shell +bin/spark-shell \ +--packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.7.1,org.apache.hadoop:hadoop-aws:3.4.0 \ +--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \ +--conf spark.sql.catalog.quickstart_catalog.warehouse=quickstart_catalog \ +--conf spark.sql.catalog.quickstart_catalog.header.X-Iceberg-Access-Delegation=vended-credentials \ +--conf spark.sql.catalog.quickstart_catalog=org.apache.iceberg.spark.SparkCatalog \ +--conf spark.sql.catalog.quickstart_catalog.catalog-impl=org.apache.iceberg.rest.RESTCatalog \ +--conf spark.sql.catalog.quickstart_catalog.uri=http://localhost:8181/api/catalog \ +--conf spark.sql.catalog.quickstart_catalog.credential='XXXX:YYYY' \ +--conf spark.sql.catalog.quickstart_catalog.scope='PRINCIPAL_ROLE:ALL' \ +--conf spark.sql.catalog.quickstart_catalog.token-refresh-enabled=true +``` + + +Replace `XXXX` and `YYYY` with the client ID and client secret generated when you created the `quickstart_user` principal. + +Similar to the CLI commands above, this configures Spark to use the Polaris running at `localhost:8181`. If your Polaris server is running elsewhere, but sure to update the configuration appropriately. + +Finally, note that we include the `hadoop-aws` package here. If your table is using a different filesystem, be sure to include the appropriate dependency. + +Once the Spark session starts, we can create a namespace and table within the catalog: + +``` +spark.sql("USE quickstart_catalog") +spark.sql("CREATE NAMESPACE IF NOT EXISTS quickstart_namespace") +spark.sql("CREATE NAMESPACE IF NOT EXISTS quickstart_namespace.schema") +spark.sql("USE NAMESPACE quickstart_namespace.schema") +spark.sql(""" + CREATE TABLE IF NOT EXISTS quickstart_table ( + id BIGINT, data STRING + ) +USING ICEBERG +""") +``` + +We can now use this table like any other: + +``` +spark.sql("INSERT INTO quickstart_table VALUES (1, 'some data')") +spark.sql("SELECT * FROM quickstart_table").show(false) +. . . ++---+---------+ +|id |data | ++---+---------+ +|1 |some data| ++---+---------+ +``` + +If at any time access is revoked... + +```shell +./polaris \ + --client-id ${CLIENT_ID} \ + --client-secret ${CLIENT_SECRET} \ + privileges \ + catalog \ + revoke \ + --catalog quickstart_catalog \ + --catalog-role quickstart_catalog_role \ + CATALOG_MANAGE_CONTENT +``` + +Spark will lose access to the table: + +``` +spark.sql("SELECT * FROM quickstart_table").show(false) + +org.apache.iceberg.exceptions.ForbiddenException: Forbidden: Principal 'quickstart_user' with activated PrincipalRoles '[]' and activated ids '[6, 7]' is not authorized for op LOAD_TABLE_WITH_READ_DELEGATION +``` diff --git a/site/content/in-dev/0.9.0/rest-catalog-open-api.md b/site/content/in-dev/0.9.0/rest-catalog-open-api.md new file mode 100644 index 00000000..ecb43a83 --- /dev/null +++ b/site/content/in-dev/0.9.0/rest-catalog-open-api.md @@ -0,0 +1,27 @@ +--- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +# +title: 'Apache Iceberg OpenAPI' +linkTitle: 'Iceberg OpenAPI' +weight: 900 +params: + show_page_toc: false +--- + +{{< redoc-polaris "rest-catalog-open-api.yaml" >}} diff --git a/site/hugo.yaml b/site/hugo.yaml index 09934213..3ee235bd 100644 --- a/site/hugo.yaml +++ b/site/hugo.yaml @@ -96,7 +96,7 @@ menu: - name: GitHub url: https://github.com/apache/polaris - - name: "Docs & Releases" + - name: "Documentation" identifier: "releases" weight: 100 params: @@ -105,6 +105,9 @@ menu: identifier: "all-releases-page" url: "/releases/" parent: "releases" + - name: "0.9.0" + url: "/in-dev/0.9.0/" + parent: "releases" - name: "In Development" url: "/in-dev/unreleased/" parent: "releases"