(polaris-tools) branch main updated: docs(iceberg-catalog-migrator): Updating the Iceberg Catalog Migrator documentation (#42)

adutra Thu, 13 Nov 2025 10:54:21 -0800

This is an automated email from the ASF dual-hosted git repository.

adutra pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/polaris-tools.git



The following commit(s) were added to refs/heads/main by this push:
     new 601fee2  docs(iceberg-catalog-migrator): Updating the Iceberg Catalog 
Migrator documentation (#42)
601fee2 is described below

commit 601fee246b7e917b420e5a1a1f73da4770fe9c45
Author: Adam Christian 
<[email protected]>
AuthorDate: Thu Nov 13 13:51:29 2025 -0500

    docs(iceberg-catalog-migrator): Updating the Iceberg Catalog Migrator 
documentation (#42)
---
 iceberg-catalog-migrator/README.md                 | 377 +--------------------
 iceberg-catalog-migrator/docs/examples.md          | 122 +++++++
 iceberg-catalog-migrator/docs/getting-started.md   | 116 +++++++
 .../docs/object-store-access-configuration.md      |  36 ++
 iceberg-catalog-migrator/docs/troubleshooting.md   |  37 ++
 5 files changed, 322 insertions(+), 366 deletions(-)

diff --git a/iceberg-catalog-migrator/README.md 
b/iceberg-catalog-migrator/README.md
index 4128f4d..04fee21 100644
--- a/iceberg-catalog-migrator/README.md
+++ b/iceberg-catalog-migrator/README.md
@@ -17,379 +17,24 @@
   - under the License.
   -->
 
-# Objective 
-Introduce a command-line tool that enables bulk migration of Iceberg tables 
from one catalog to another without the need to copy the data.
+# Iceberg Catalog Migrator
+ 
+The Iceberg Catalog Migrator is a command-line tool that enables bulk 
migration of [Apache Iceberg](https://iceberg.apache.org/) tables from one 
[Iceberg Catalog](https://iceberg.apache.org/rest-catalog-spec/) to another 
without the need to copy the data. This tool works with all Iceberg Catalogs; 
not just Polaris.
 
-There are various reasons why users may want to move their Iceberg tables to a 
different catalog. For instance,
-* They were using hadoop catalog and later realized that it is not production 
recommended. So, they want to move tables to other production ready catalogs.
-* They just heard about the awesome Apache Polaris catalog and want to move 
their existing iceberg tables to Apache Polaris catalog.
-* They had an on-premise Hive catalog, but want to move tables to a 
cloud-based catalog as part of their cloud migration strategy.
-
-The CLI tool should support two commands
-* migrate - To bulk migrate the iceberg tables from source catalog to target 
catalog without data copy. 
-Table entries from source catalog will be deleted after the successful 
migration to the target catalog.
-* register - To bulk register the iceberg tables from source catalog to target 
catalog without data copy. 
+The migrator tool provides two operations:
+* Migrate - Bulk migration of the Iceberg tables from source catalog to target 
catalog. Table entries from source catalog will be deleted after the successful 
migration to the target catalog.
+* Register - Bulk register the Iceberg tables from source catalog to target 
catalog. 
 
 > :warning: `register` command just registers the table.
 Which means the table will be present in both the catalogs after registering.
-**Operating same table from more than one catalog can lead to missing updates, 
loss of data and table corruption.
-So, it is recommended to use the 'migrate' command in CLI to automatically 
delete the table from source catalog after registering
+**Operating same table from more than one catalog can lead to missing updates, 
loss of data, and table corruption.
+It is recommended to use the 'migrate' command in CLI to automatically delete 
the table from source catalog after registering
 or avoid operating tables from the source catalog after registering if 
'migrate' command is not used.**
 
-> :warning: **Avoid using this CLI tool when there are in-progress commits for 
tables in the source catalog 
+> :warning: **Avoid using this tool when there are in-progress commits for 
tables in the source catalog 
 to prevent missing updates, data loss and table corruption in the target 
catalog. 
 In-progress commits may not be properly transferred and could compromise the 
integrity of your data.**
 
-# Iceberg-catalog-migrator
-Need to have Java installed in your machine (Java 21 is recommended and the 
minimum Java version) to use this CLI tool.
-
-Below is the CLI syntax:
-```
-$ java -jar iceberg-catalog-migrator-cli-0.0.1.jar -h        
-Usage: iceberg-catalog-migrator [-hV] [COMMAND]
-  -h, --help      Show this help message and exit.
-  -V, --version   Print version information and exit.
-Commands:
-  migrate   Bulk migrate the iceberg tables from source catalog to target 
catalog without data copy. Table entries from source catalog will be
-              deleted after the successful migration to the target catalog.
-  register  Bulk register the iceberg tables from source catalog to target 
catalog without data copy.
-```
-
-```
-$ java -jar iceberg-catalog-migrator-cli-0.0.1.jar migrate -h
-Usage: iceberg-catalog-migrator migrate [-hV] [--disable-safety-prompts] 
[--dry-run] [--stacktrace] [--output-dir=<outputDirPath>]
-                                        (--source-catalog-type=<type> 
--source-catalog-properties=<String=String>[,<String=String>...]
-                                        
[--source-catalog-properties=<String=String>[,<String=String>...]]...
-                                        
[--source-catalog-hadoop-conf=<String=String>[,<String=String>...]]...
-                                        
[--source-custom-catalog-impl=<customCatalogImpl>]) 
(--target-catalog-type=<type>
-                                        
--target-catalog-properties=<String=String>[,<String=String>...] 
[--target-catalog-properties=<String=String>
-                                        [,<String=String>...]]... 
[--target-catalog-hadoop-conf=<String=String>[,<String=String>...]]...
-                                        
[--target-custom-catalog-impl=<customCatalogImpl>]) 
[--identifiers=<identifiers>[,<identifiers>...]
-                                        
[--identifiers=<identifiers>[,<identifiers>...]]... | 
--identifiers-from-file=<identifiersFromFile> |
-                                        --identifiers-regex=<identifiersRegEx>]
-Bulk migrate the iceberg tables from source catalog to target catalog without 
data copy. Table entries from source catalog will be deleted after the
-successful migration to the target catalog.
-      --output-dir=<outputDirPath>
-                     Optional local output directory path to write CLI output 
files like `failed_identifiers.txt`, `failed_to_delete_at_source.txt`,
-                       `dry_run_identifiers.txt`. If not specified, uses the 
present working directory.
-                     Example: --output-dir /tmp/output/
-                              --output-dir $PWD/output_folder
-      --dry-run      Optional configuration to simulate the registration 
without actually registering. Can learn about a list of tables that will be
-                       registered by running this.
-      --disable-safety-prompts
-                     Optional configuration to disable safety prompts which 
needs console input.
-      --stacktrace   Optional configuration to enable capturing stacktrace in 
logs in case of failures.
-  -h, --help         Show this help message and exit.
-  -V, --version      Print version information and exit.
-Source catalog options:
-      --source-catalog-type=<type>
-                     Source catalog type. Can be one of these [CUSTOM, 
DYNAMODB, ECS, GLUE, HADOOP, HIVE, JDBC, NESSIE, REST].
-                     Example: --source-catalog-type GLUE
-                              --source-catalog-type NESSIE
-      --source-catalog-properties=<String=String>[,<String=String>...]
-                     Iceberg catalog properties for source catalog (like uri, 
warehouse, etc).
-                     Example: --source-catalog-properties 
uri=http://localhost:19120/api/v1,ref=main,warehouse=/tmp/warehouseNessie
-      --source-catalog-hadoop-conf=<String=String>[,<String=String>...]
-                     Optional source catalog Hadoop configurations required by 
the Iceberg catalog.
-                     Example: --source-catalog-hadoop-conf 
key1=value1,key2=value2
-      --source-custom-catalog-impl=<customCatalogImpl>
-                     Optional fully qualified class name of the custom catalog 
implementation of the source catalog. Required when the catalog type
-                       is CUSTOM.
-                     Example: --source-custom-catalog-impl 
org.apache.iceberg.AwesomeCatalog
-Target catalog options:
-      --target-catalog-type=<type>
-                     Target catalog type. Can be one of these [CUSTOM, 
DYNAMODB, ECS, GLUE, HADOOP, HIVE, JDBC, NESSIE, REST].
-                     Example: --target-catalog-type GLUE
-                              --target-catalog-type NESSIE
-      --target-catalog-properties=<String=String>[,<String=String>...]
-                     Iceberg catalog properties for target catalog (like uri, 
warehouse, etc).
-                     Example: --target-catalog-properties 
uri=http://localhost:19120/api/v1,ref=main,warehouse=/tmp/warehouseNessie
-      --target-catalog-hadoop-conf=<String=String>[,<String=String>...]
-                     Optional target catalog Hadoop configurations required by 
the Iceberg catalog.
-                     Example: --target-catalog-hadoop-conf 
key1=value1,key2=value2
-      --target-custom-catalog-impl=<customCatalogImpl>
-                     Optional fully qualified class name of the custom catalog 
implementation of the target catalog. Required when the catalog type
-                       is CUSTOM.
-                     Example: --target-custom-catalog-impl 
org.apache.iceberg.AwesomeCatalog
-Identifier options:
-      --identifiers=<identifiers>[,<identifiers>...]
-                     Optional selective set of identifiers to register. If not 
specified, all the tables will be registered. Use this when there are
-                       few identifiers that need to be registered. For a large 
number of identifiers, use the `--identifiers-from-file` or
-                       `--identifiers-regex` option.
-                     Example: --identifiers foo.t1,bar.t2
-      --identifiers-from-file=<identifiersFromFile>
-                     Optional text file path that contains a set of table 
identifiers (one per line) to register. Should not be used with
-                       `--identifiers` or `--identifiers-regex` option.
-                     Example: --identifiers-from-file /tmp/files/ids.txt
-      --identifiers-regex=<identifiersRegEx>
-                     Optional regular expression pattern used to register only 
the tables whose identifiers match this pattern. Should not be used
-                       with `--identifiers` or '--identifiers-from-file' 
option.
-                     Example: --identifiers-regex ^foo\..*
-```
-
-Note: Options for register command is exactly same as migrate command.
-
-# Sample Inputs
-
-Note: 
-a) Before migrating tables to Apache polaris, Make sure the catalog instance 
is configured to the `base-location`
-same as source catalog `warehouse` location during catalog creation.
-
-```
-{
-  "catalog": {
-    "name": "test",
-    "type": "INTERNAL",
-    "readOnly": false,
-    "properties": {
-      "default-base-location": "file:/path/to/source_catalog"
-    },
-    "storageConfigInfo": {
-      "storageType": "FILE",
-      "allowedLocations": [
-        "file:/path/to/source_catalog"
-      ]
-    }
-  }
-}
-```
-
-b) Get the Oauth token and export it to the local variable
-
-```shell
-curl -X POST http://localhost:8181/api/catalog/v1/oauth/tokens \
--d "grant_type=client_credentials" \
--d "client_id=my-client-id" \
--d "client_secret=my-client-secret" \
--d "scope=PRINCIPAL_ROLE:ALL"
-
-export TOKEN=xxxxxxx
-```
-
-c) Also export the required storage related configs and use them respectively 
for catalog configuration. 
-For s3,
-
-```shell
-export AWS_ACCESS_KEY_ID=xxxxxxx
-export AWS_SECRET_ACCESS_KEY=xxxxxxx
-export AWS_S3_ENDPOINT=xxxxxxx
-```
-
-for ADLS,
-```shell
-export AZURE_SAS_TOKEN=<token>
-```
-
-## Bulk registering all the tables from Hadoop catalog to Polaris catalog
-```shell
-java -jar iceberg-catalog-migrator-cli-0.0.1.jar register \
---source-catalog-type HADOOP \
---source-catalog-properties warehouse=/tmp/warehouse,type=hadoop \
---target-catalog-type REST  \
---target-catalog-properties 
uri=http://localhost:60904/api/catalog,warehouse=test,token=$TOKEN 
-```
-
-## Migrate selected tables (t1,t2 in namespace foo) from Hadoop catalog to 
Polaris catalog
-
-```shell
-java -jar iceberg-catalog-migrator-cli-0.0.1.jar migrate \
---source-catalog-type HADOOP \
---source-catalog-properties warehouse=/tmp/warehouse,type=hadoop \
---target-catalog-type REST  \
---target-catalog-properties 
uri=http://localhost:60904/api/catalog,warehouse=test,token=$TOKEN \
---identifiers foo.t1,foo.t2
-```
-
-## Migrate all tables from GLUE catalog to Polaris catalog
-```shell
-java -jar iceberg-catalog-migrator-cli-0.0.1.jar migrate \
---source-catalog-type GLUE \
---source-catalog-properties 
warehouse=s3a://some-bucket/wh/,io-impl=org.apache.iceberg.aws.s3.S3FileIO \
---target-catalog-type REST  \
---target-catalog-properties 
uri=http://localhost:60904/api/catalog,warehouse=test,token=$TOKEN
-```
-
-## Migrate all tables from HIVE catalog to Polaris catalog
-```shell
-java -jar iceberg-catalog-migrator-cli-0.0.1.jar migrate \
---source-catalog-type HIVE \
---source-catalog-properties 
warehouse=s3a://some-bucket/wh/,io-impl=org.apache.iceberg.aws.s3.S3FileIO,uri=thrift://localhost:9083
 \
---target-catalog-type REST  \
---target-catalog-properties 
uri=http://localhost:60904/api/catalog,warehouse=test,token=$TOKEN
-```
-
-Note: Need to configure `ALLOW_UNSTRUCTURED_TABLE_LOCATION` property at the 
polaris server side as
-HMS creates a namespace folder with ".db" extension. Also need to configure 
`allowedLocations` to be 
-source catalog directory in `storage_configuration_info`.
-
-## Migrate all tables from DYNAMODB catalog to Polaris catalog
-```shell
-java -jar iceberg-catalog-migrator-cli-0.0.1.jar migrate \
---source-catalog-type DYNAMODB \
---source-catalog-properties 
warehouse=s3a://some-bucket/wh/,io-impl=org.apache.iceberg.aws.s3.S3FileIO \
---target-catalog-type REST  \
---target-catalog-properties 
uri=http://localhost:60904/api/catalog,warehouse=test,token=$TOKEN
-```
-
-## Migrate all tables from JDBC catalog to Polaris catalog
-```shell
-java -jar iceberg-catalog-migrator-cli-0.0.1.jar migrate \ 
---source-catalog-type JDBC \
---source-catalog-properties 
warehouse=/tmp/warehouseJdbc,jdbc.user=root,jdbc.password=pass,uri=jdbc:mysql://localhost:3306/db1,name=catalogName
 \
---target-catalog-type REST  \
---target-catalog-properties 
uri=http://localhost:60904/api/catalog,warehouse=test,token=$TOKEN
-```
-
-# Scenarios
-## A. User wants to try out a new catalog
-Users can use a new catalog by creating a fresh table to test the new 
catalog's capabilities.
-
-## B. Users wants to move the tables from one catalog (example: Hive) to 
another (example: Nessie).
-
-### B.1) Executes `--dry-run` option to check which tables will get migrated.
-
-Sample input:
-```shell
-java -jar iceberg-catalog-migrator-cli-0.0.1.jar migrate \
---source-catalog-type HIVE \
---source-catalog-properties 
warehouse=s3a://some-bucket/wh/,io-impl=org.apache.iceberg.aws.s3.S3FileIO,uri=thrift://localhost:9083
 \
---target-catalog-type NESSIE  \
---target-catalog-properties 
uri=http://localhost:19120/api/v1,ref=main,warehouse=/tmp/warehouse \
---dry-run
-```
-
-After validating all inputs, the console will display a list of table 
identifiers, that are identified for migration, along with the total count. 
-This information will also be written to a file called `dry_run.txt`, 
-The list of table identifiers in `dry_run.txt` can be altered (if needed) and 
reused for the actual migration using the `--identifiers-from-file` option; 
thus eliminating the need for the tool to list the tables from the catalog in 
the actual run.
-
-### B.2) Executes the migration of all 1000 tables and all the tables are 
successfully migrated.
-
-Sample input:
-```shell
-java -jar iceberg-catalog-migrator-cli-0.0.1.jar migrate \
---source-catalog-type HIVE \
---source-catalog-properties 
warehouse=s3a://some-bucket/wh/,io-impl=org.apache.iceberg.aws.s3.S3FileIO,uri=thrift://localhost:9083
 \
---target-catalog-type NESSIE  \
---target-catalog-properties 
uri=http://localhost:19120/api/v1,ref=main,warehouse=/tmp/warehouse
-```
-
-After input validation, users will receive a prompt message with the option to 
either abort or continue the operation.
-
-```
-WARN  - User has not specified the table identifiers. Will be selecting all 
the tables from all the namespaces from the source catalog.
-INFO  - Configured source catalog: SOURCE_CATALOG_HIVE
-INFO  - Configured target catalog: TARGET_CATALOG_NESSIE
-WARN  - 
-       a) Executing catalog migration when the source catalog has some 
in-progress commits 
-       can lead to a data loss as the in-progress commits will not be 
considered for migration. 
-       So, while using this tool please make sure there are no in-progress 
commits for the source catalog.
-
-       b) After the migration, successfully migrated tables will be deleted 
from the source catalog 
-       and can only be accessed from the target catalog.
-INFO  - Are you certain that you wish to proceed, after reading the above 
warnings? (yes/no):
-```
-
-If the user chooses to continue, additional information will be displayed on 
the console.
-
-```
-INFO  - Continuing...
-INFO  - Identifying tables for migration ...
-INFO  - Identified 1000 tables for migration.
-INFO  - Started migration ...
-INFO  - Attempted Migration for 100 tables out of 1000 tables.
-INFO  - Attempted Migration for 200 tables out of 1000 tables.
-.
-.
-.
-INFO  - Attempted Migration for 900 tables out of 1000 tables.
-INFO  - Attempted Migration for 1000 tables out of 1000 tables.
-INFO  - Finished migration ...
-INFO  - Summary:
-INFO  - Successfully migrated 1000 tables from HIVE catalog to NESSIE catalog.
-INFO  - Details:
-INFO  - Successfully migrated these tables:
-[foo.tbl-1, foo.tbl-2, bar.tbl-4, bar.tbl-3, …, …,bar.tbl-1000]
-```
-
-Please note that a log file will be created, which will print "successfully 
migrated table X" for every table migration, 
-and also log any table level failures, if present.
-
-### B.3) Executes the migration and out of 1000 tables 10 tables have failed 
to migrate because of some error. Remaining 990 tables were successfully 
migrated.
-
-Sample input:
-```shell
-java -jar iceberg-catalog-migrator-cli-0.0.1.jar migrate \
---source-catalog-type HIVE \
---source-catalog-properties 
warehouse=s3a://some-bucket/wh/,io-impl=org.apache.iceberg.aws.s3.S3FileIO,uri=thrift://localhost:9083
 \
---target-catalog-type NESSIE  \
---target-catalog-properties 
uri=http://localhost:19120/api/v1,ref=main,warehouse=/tmp/warehouse \
---stacktrace
-```
-
-Console output will be same as B.2) till summary because even in case of 
failure,
-all the identified tables will be attempted for migration.
-
-```
-INFO  - Summary:
-INFO  - Successfully migrated 990 tables from HIVE catalog to NESSIE catalog.
-ERROR - Failed to migrate 10 tables from HIVE catalog to NESSIE catalog. 
Please check the `catalog_migration.log` file for the failure reason.
-Failed Identifiers are written to `failed_identifiers.txt`. Retry with that 
file using the `--identifiers-from-file` option if the failure is because of 
network/connection timeouts.
-INFO  - Details:
-INFO  - Successfully migrated these tables:
-[foo.tbl-1, foo.tbl-2, bar.tbl-4, bar.tbl-3, …, …,bar.tbl-1000]
-ERROR  - Failed to migrate these tables:
-[bar.tbl-201, foo.tbl-202, …, …,bar.tbl-210]
-```
-
-Please note that a log file will be generated, which will print "successfully 
migrated table X" for every table migration and log any table-level failures in 
the `failed_identifiers.txt` file.
-Users can use this file to identify failed tables and search for them in the 
log, which will contain the exception stacktrace for those 10 tables. 
-This can help users understand why the migration failed. 
-* If the migration of those tables failed with `TableAlreadyExists` exception, 
users can rename the tables in the source catalog and migrate only those 10 
tables using any of the identifier options available in the argument.
-* If the migration of those tables failed with `ConnectionTimeOut` exception, 
users can retry migrating only those 10 tables using the 
`--identifiers-from-file` option with the `failed_identifiers.txt` file.
-* If the migration is successful but deletion of some tables form source 
catalog is failed, summary will mention that these table names were written 
into the `failed_to_delete.txt` file and logs will capture the failure reason.
-Do not operate these tables from the source catalog and user will have to 
delete them manually.
-
-### B.4)  Executes the migration and out of 1000 tables. But manually aborts 
the migration by killing the process.
-
-To determine the number of migrated tables, the user can either review the log 
or use the `listTables()` function in the target catalog. 
-In the event of an abort, migrated tables may not be deleted from the source 
catalog, and users should avoid manipulating them from there. 
-To recover, users can manually remove these tables from the source catalog or 
attempt a bulk migration to transfer all tables from the source catalog.
-
-### B.5) Users need to move away from one catalog to another with selective 
tables (maybe want to move only the production tables, test tables, etc)
-
-Users can provide the selective list of identifiers to migrate using any of 
these 3 options
-`--identifiers`, `--identifiers-from-file`, `--identifier-regex` and it can be 
used along with the dry-run option too.
-
-Sample input: (only migrate tables that starts with "foo.")
-```shell
-java -jar iceberg-catalog-migrator-cli-0.0.1.jar migrate \
---source-catalog-type HIVE \
---source-catalog-properties 
warehouse=s3a://some-bucket/wh/,io-impl=org.apache.iceberg.aws.s3.S3FileIO,uri=thrift://localhost:9083
 \
---target-catalog-type NESSIE  \
---target-catalog-properties 
uri=http://localhost:19120/api/v1,ref=main,warehouse=/tmp/warehouse \
---identifiers-regex ^foo\..*
-
-```
-
-Sample input: (migrate all tables in the file ids.txt where each entry is 
delimited by newline)
-```shell
-java -jar iceberg-catalog-migrator-cli-0.0.1.jar migrate \
---source-catalog-type HIVE \
---source-catalog-properties warehouse=/tmp/warehouse,type=hadoop \
---target-catalog-type NESSIE  \
---target-catalog-properties 
uri=http://localhost:19120/api/v1,ref=main,warehouse=/tmp/warehouse \
---identifiers-from-file ids.txt
-```
-
-Sample input: (migrate only two tables foo.tbl1, foo.tbl2)
-```shell
-java -jar iceberg-catalog-migrator-cli-0.0.1.jar migrate \
---source-catalog-type HIVE \
---source-catalog-properties 
warehouse=s3a://some-bucket/wh/,io-impl=org.apache.iceberg.aws.s3.S3FileIO,uri=thrift://localhost:9083
 \
---target-catalog-type NESSIE  \
---target-catalog-properties 
uri=http://localhost:19120/api/v1,ref=main,warehouse=/tmp/warehouse \
---identifiers foo.tbl1,foo.tbl2
-```
+Please use the [getting started guide](docs/getting-started.md) for a 
step-by-step guide on how to use the tool.
 
-Console will clearly print that only these identifiers are used for table 
migration.
-Rest of the behavior will be the same as mentioned in the previous sections.
\ No newline at end of file
+Please use the [examples guide](./docs/examples.md) to learn about the 
different options available in the tool.
diff --git a/iceberg-catalog-migrator/docs/examples.md 
b/iceberg-catalog-migrator/docs/examples.md
new file mode 100644
index 0000000..dbc7a8a
--- /dev/null
+++ b/iceberg-catalog-migrator/docs/examples.md
@@ -0,0 +1,122 @@
+<!--
+  - Licensed to the Apache Software Foundation (ASF) under one
+  - or more contributor license agreements.  See the NOTICE file
+  - distributed with this work for additional information
+  - regarding copyright ownership.  The ASF licenses this file
+  - to you under the Apache License, Version 2.0 (the
+  - "License"); you may not use this file except in compliance
+  - with the License.  You may obtain a copy of the License at
+  -
+  -   http://www.apache.org/licenses/LICENSE-2.0
+  -
+  - Unless required by applicable law or agreed to in writing,
+  - software distributed under the License is distributed on an
+  - "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  - KIND, either express or implied.  See the License for the
+  - specific language governing permissions and limitations
+  - under the License.
+  -->
+
+# Examples
+
+This document provides examples of how to use the Iceberg Catalog Migrator for 
various Iceberg Catalogs. It is broken into:
+1. [Registration Examples](#registration-examples)
+2. [Migration Examples](#migration-examples)
+3. [Tips](#tips)
+
+For more information on how handle failures, please refer to [the 
troubleshooting guide](./troubleshooting.md).
+
+## Registration Examples
+Below are some examples of registering tables from one catalog to another.
+
+### Registering All Tables from Hadoop Catalog to Polaris Catalog
+
+```shell
+java -jar iceberg-catalog-migrator-cli-0.0.1.jar register \
+--source-catalog-type HADOOP \
+--source-catalog-properties warehouse=/tmp/warehouse,type=hadoop \
+--target-catalog-type REST  \
+--target-catalog-properties 
uri=http://localhost:60904/api/catalog,warehouse=test,token=$TOKEN 
+```
+
+## Migration Examples
+
+### Migrate Selected Tables from Hadoop Catalog to Polaris Catalog
+
+In this example, only tables t1 and t2 in namespace foo will be migrated.
+
+```shell
+java -jar iceberg-catalog-migrator-cli-0.0.1.jar migrate \
+--source-catalog-type HADOOP \
+--source-catalog-properties warehouse=/tmp/warehouse,type=hadoop \
+--target-catalog-type REST  \
+--target-catalog-properties 
uri=http://localhost:60904/api/catalog,warehouse=test,token=$TOKEN \
+--identifiers foo.t1,foo.t2
+```
+
+### Migrate from GLUE Catalog to Polaris Catalog
+```shell
+java -jar iceberg-catalog-migrator-cli-0.0.1.jar migrate \
+--source-catalog-type GLUE \
+--source-catalog-properties 
warehouse=s3a://some-bucket/wh/,io-impl=org.apache.iceberg.aws.s3.S3FileIO \
+--target-catalog-type REST  \
+--target-catalog-properties 
uri=http://localhost:60904/api/catalog,warehouse=test,token=$TOKEN
+```
+
+### Migrate from HIVE Catalog to Polaris Catalog
+```shell
+java -jar iceberg-catalog-migrator-cli-0.0.1.jar migrate \
+--source-catalog-type HIVE \
+--source-catalog-properties 
warehouse=s3a://some-bucket/wh/,io-impl=org.apache.iceberg.aws.s3.S3FileIO,uri=thrift://localhost:9083
 \
+--target-catalog-type REST  \
+--target-catalog-properties 
uri=http://localhost:60904/api/catalog,warehouse=test,token=$TOKEN
+```
+
+Note: You will need to configure `ALLOW_UNSTRUCTURED_TABLE_LOCATION` property 
on the Polaris server side as
+HMS creates a namespace folder with ".db" extension. In addition, you will 
need to configure `allowedLocations` to be
+source catalog directory in `storage_configuration_info`.
+
+### Migrate from DYNAMODB Catalog to Polaris Catalog
+```shell
+java -jar iceberg-catalog-migrator-cli-0.0.1.jar migrate \
+--source-catalog-type DYNAMODB \
+--source-catalog-properties 
warehouse=s3a://some-bucket/wh/,io-impl=org.apache.iceberg.aws.s3.S3FileIO \
+--target-catalog-type REST  \
+--target-catalog-properties 
uri=http://localhost:60904/api/catalog,warehouse=test,token=$TOKEN
+```
+
+### Migrate from JDBC Catalog to Polaris Catalog
+```shell
+java -jar iceberg-catalog-migrator-cli-0.0.1.jar migrate \ 
+--source-catalog-type JDBC \
+--source-catalog-properties 
warehouse=/tmp/warehouseJdbc,jdbc.user=root,jdbc.password=pass,uri=jdbc:mysql://localhost:3306/db1,name=catalogName
 \
+--target-catalog-type REST  \
+--target-catalog-properties 
uri=http://localhost:60904/api/catalog,warehouse=test,token=$TOKEN
+```
+
+### Migrate Only Tables Starting with "foo"
+```shell
+java -jar iceberg-catalog-migrator-cli-0.0.1.jar migrate \
+--source-catalog-type HIVE \
+--source-catalog-properties 
warehouse=s3a://some-bucket/wh/,io-impl=org.apache.iceberg.aws.s3.S3FileIO,uri=thrift://localhost:9083
 \
+--target-catalog-type NESSIE  \
+--target-catalog-properties 
uri=http://localhost:19120/api/v1,ref=main,warehouse=/tmp/warehouse \
+--identifiers-regex ^foo\..*
+
+```
+
+### Migrate Tables from a File
+The file idx.txt contains each table identifier to migrate delimited by a 
newline.
+
+```shell
+java -jar iceberg-catalog-migrator-cli-0.0.1.jar migrate \
+--source-catalog-type HIVE \
+--source-catalog-properties warehouse=/tmp/warehouse,type=hadoop \
+--target-catalog-type NESSIE  \
+--target-catalog-properties 
uri=http://localhost:19120/api/v1,ref=main,warehouse=/tmp/warehouse \
+--identifiers-from-file ids.txt
+```
+
+
+## Tips
+1. Before migrating tables to Polaris, make sure the catalog is configured to 
the `base-location` same as source catalog `warehouse` location during catalog 
creation.
diff --git a/iceberg-catalog-migrator/docs/getting-started.md 
b/iceberg-catalog-migrator/docs/getting-started.md
new file mode 100644
index 0000000..48bd6f9
--- /dev/null
+++ b/iceberg-catalog-migrator/docs/getting-started.md
@@ -0,0 +1,116 @@
+<!--
+  - Licensed to the Apache Software Foundation (ASF) under one
+  - or more contributor license agreements.  See the NOTICE file
+  - distributed with this work for additional information
+  - regarding copyright ownership.  The ASF licenses this file
+  - to you under the Apache License, Version 2.0 (the
+  - "License"); you may not use this file except in compliance
+  - with the License.  You may obtain a copy of the License at
+  -
+  -   http://www.apache.org/licenses/LICENSE-2.0
+  -
+  - Unless required by applicable law or agreed to in writing,
+  - software distributed under the License is distributed on an
+  - "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  - KIND, either express or implied.  See the License for the
+  - specific language governing permissions and limitations
+  - under the License.
+  -->
+
+# Getting Started
+
+This document provides a step-by-step guide on how to use the Iceberg Catalog 
Migrator.
+This guide uses an example of migrating from a Polaris catalog to another 
Polaris catalog that are backed by an AWS S3 bucket.
+
+## Prerequisites
+1. Java 21 or later installed
+2. Have a target catalog created and configured
+3. Have a source catalog to migrate from
+4. Block in-progress commits to the source catalog
+
+## Getting Started
+Migration happens in five steps:
+1. Build the Iceberg Catalog Migrator
+2. Set the object storage environment variables
+3. Get access to the source and target catalogs
+4. Validate the migration
+5. Migrate the tables
+
+### Step 1: Build the Iceberg Catalog Migrator
+Execute the following commands to build the tool:
+```shell
+git clone https://github.com/apache/polaris-tools.git 
+cd polaris-tools/iceberg-catalog-migrator
+./gradlew build
+```
+
+These commands:
+1. Clone the repository
+2. Navigate to the `iceberg-catalog-migrator` directory
+3. Build the tool
+4. Create a JAR file in `iceberg-catalog-migrator/cli/build/libs/` directory
+
+The JAR file will be created with name 
`iceberg-catalog-migrator-cli-<version>.jar` where `<version>` is the version 
of the tool found in the `iceberg-catalog-migrator/version.txt` file.  For the 
examples below, we will assume the version is `0.0.1-SNAPSHOT`, so the JAR file 
name will be `iceberg-catalog-migrator-cli-0.0.1-SNAPSHOT.jar`.
+
+### Step 2: Set the Object Storage Environment Variables
+The tool will need access to the underlying object storage via environmental 
variables. For this example, we will use AWS S3 with an access key and id:
+```shell
+export AWS_ACCESS_KEY_ID=<access_key>
+export AWS_SECRET_ACCESS_KEY=<secret_key>
+```
+
+For more information on configuring access to object storage, please see [this 
guide](./object-store-access-configuration.md).
+
+### Step 3: Get Access to the Source and Target Catalogs
+The tool will need to be authorized to the source & target catalogs. In this 
example, we will use two Polaris catalogs. For getting access to a Polaris 
catalog, use the OAuth token endpoint like:
+```shell
+curl -X POST http://sourcecatalog:8181/api/catalog/v1/oauth/tokens \
+-d "grant_type=client_credentials" \
+-d "client_id=my-client-id" \
+-d "client_secret=my-client-secret" \
+-d "scope=PRINCIPAL_ROLE:ALL"
+
+export TOKEN_SOURCE=xxxxxxx
+
+curl -X POST http://targetcatalog:8181/api/catalog/v1/oauth/tokens \
+-d "grant_type=client_credentials" \
+-d "client_id=my-client-id" \
+-d "client_secret=my-client-secret" \
+-d "scope=PRINCIPAL_ROLE:ALL"
+
+export TOKEN_TARGET=xxxxxxx
+```
+
+### Step 4: Validate the Migration
+Execute the following command to understand how to migrate the tables:
+```shell
+java -jar ./cli/build/libs/iceberg-catalog-migrator-cli-0.0.1-SNAPSHOT.jar 
register -h  
+```
+
+In the example, execute the following command to perform a dry run migration. 
This will not migrate the tables but will provide information on the operation:
+```shell
+java -jar ./cli/build/libs/iceberg-catalog-migrator-cli-0.0.1-SNAPSHOT.jar 
register \
+--source-catalog-type REST \
+--source-catalog-properties 
uri=http://sourcecatalog:8181/api/catalog,warehouse=test,token=$TOKEN_SOURCE \
+--target-catalog-type REST  \
+--target-catalog-properties 
uri=http://targetcatalog:8181/api/catalog,warehouse=test,token=$TOKEN_TARGET \
+--dry-run
+```
+
+After validating all inputs, the console will display a list of table 
identifiers that are identified for migration. This information will also be 
written to a file called `dry_run.txt`,
+
+### Step 5: Migrate the Tables
+
+In the example, execute the following command to perform a migration:
+```shell
+java -jar ./cli/build/libs/iceberg-catalog-migrator-cli-0.0.1-SNAPSHOT.jar 
migrate \
+--source-catalog-type REST \
+--source-catalog-properties 
uri=http://sourcecatalog:8181/api/catalog,warehouse=test,token=$TOKEN_SOURCE \
+--target-catalog-type REST  \
+--target-catalog-properties 
uri=http://targetcatalog:8181/api/catalog,warehouse=test,token=$TOKEN_TARGET
+```
+
+Please note that a log file will be created to verify the migration proceeded 
successfully.
+If any issues occur, please use [the troubleshooting 
guide](./troubleshooting.md).
+
+For more example migrations, please see [this guide](./examples.md).
diff --git a/iceberg-catalog-migrator/docs/object-store-access-configuration.md 
b/iceberg-catalog-migrator/docs/object-store-access-configuration.md
new file mode 100644
index 0000000..a28b760
--- /dev/null
+++ b/iceberg-catalog-migrator/docs/object-store-access-configuration.md
@@ -0,0 +1,36 @@
+<!--
+  - Licensed to the Apache Software Foundation (ASF) under one
+  - or more contributor license agreements.  See the NOTICE file
+  - distributed with this work for additional information
+  - regarding copyright ownership.  The ASF licenses this file
+  - to you under the Apache License, Version 2.0 (the
+  - "License"); you may not use this file except in compliance
+  - with the License.  You may obtain a copy of the License at
+  -
+  -   http://www.apache.org/licenses/LICENSE-2.0
+  -
+  - Unless required by applicable law or agreed to in writing,
+  - software distributed under the License is distributed on an
+  - "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  - KIND, either express or implied.  See the License for the
+  - specific language governing permissions and limitations
+  - under the License.
+  -->
+
+# Object Store Access Configuration
+
+This document provides a guide on how to configure access to object stores for 
the Iceberg Catalog Migrator.
+
+## AWS S3
+For AWS, you can use the following environment variables:
+```shell
+export AWS_ACCESS_KEY_ID=xxxxxxx
+export AWS_SECRET_ACCESS_KEY=xxxxxxx
+export AWS_S3_ENDPOINT=xxxxxxx
+```
+
+## ADLS
+For ADLS, you can use the following environment variables:
+```shell
+export AZURE_SAS_TOKEN=xxxxxxx
+```
diff --git a/iceberg-catalog-migrator/docs/troubleshooting.md 
b/iceberg-catalog-migrator/docs/troubleshooting.md
new file mode 100644
index 0000000..dd44493
--- /dev/null
+++ b/iceberg-catalog-migrator/docs/troubleshooting.md
@@ -0,0 +1,37 @@
+<!--
+  - Licensed to the Apache Software Foundation (ASF) under one
+  - or more contributor license agreements.  See the NOTICE file
+  - distributed with this work for additional information
+  - regarding copyright ownership.  The ASF licenses this file
+  - to you under the Apache License, Version 2.0 (the
+  - "License"); you may not use this file except in compliance
+  - with the License.  You may obtain a copy of the License at
+  -
+  -   http://www.apache.org/licenses/LICENSE-2.0
+  -
+  - Unless required by applicable law or agreed to in writing,
+  - software distributed under the License is distributed on an
+  - "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  - KIND, either express or implied.  See the License for the
+  - specific language governing permissions and limitations
+  - under the License.
+  -->
+
+# Troubleshooting
+
+This document provides troubleshooting information for common issues 
encountered while using the Iceberg Catalog Migrator:
+1. [Errors while migrating tables](#errors-while-migrating-tables)
+2. [Manually aborting the migration](#manually-aborting-the-migration)
+
+## Errors while Migrating Tables
+There can be errors while migrating tables. These errors can come from the 
source or the target. To troubleshoot:
+1. Look at the console output or the log file to identify the failed tables. 
In the logs, there will be an exception stracktrace for each failed table.
+2. If the migration of those tables failed with `TableAlreadyExists` 
exception, there is a conflict in the table identifiers in the target catalog. 
Users can rename the tables in the source catalog.
+3. If the migration of those tables failed with `ConnectionTimeOut` exception, 
users can retry migrating only those tables using the `--identifiers-from-file` 
option with the `failed_identifiers.txt` file created in the output directory.
+4. If the migration is successful but deletion of some tables form source 
catalog is failed, summary will mention that these table names were written 
into the `failed_to_delete.txt` file and logs will capture the failure reason. 
Do not operate these tables from the source catalog and user will have to 
delete them manually.
+
+## Manually Aborting the Migration
+If a migration was manually aborted:
+1. Determine the number of migrated tables. A user can either review the log 
or use the `listTables()` function in the target catalog.
+2. Migrated tables may not be deleted from the source catalog. Users should 
avoid manipulating these tables in the source catalog.
+3. To recover, the user can manually remove these tables from the source 
catalog or attempt a bulk migration to transfer all tables from the source 
catalog. Please note that this may result in several `TableAlreadyExists` 
exceptions as many of the tables may have already been migrated.

(polaris-tools) branch main updated: docs(iceberg-catalog-migrator): Updating the Iceberg Catalog Migrator documentation (#42)

Reply via email to