Hi, folks,
Currently, our company is an Iceberg user, and as part of our efforts to
scale Iceberg usage, we are working on migrating Hive tables in bulk to
Iceberg tables. After researching community solutions for in-place metadata
upgrades (Snapshot and Migrate), we have further improved these methods and
created a new stored procedure called hive_to_iceberg.
The migration process in this stored procedure consists of the following
steps:
1.
*Step 1*: Create a new snapshot table for the source table (using the
Snapshot stored procedure).
2.
*Step 2*: Migrate the metadata of the new snapshot table back to the
source table (using the ReWriteTablePath stored procedure).
3.
*Step 3*: Update multiple important properties in the Hive Metastore.
4.
*Step 4*: Delete the snapshot table.
This stored procedure combines the capabilities of Snapshot and
ReWriteTablePath to effectively migrate Hive tables to Iceberg tables.
*The advantages of this solution include:*
1.
*Simplified migration parameters*: The user only needs to provide the
table name and parallelism parameter, with parallelism determined by the
number of partitions.
2.
*Reuse of existing stored procedures*: By recombining the Snapshot and
ReWriteTablePath stored procedures, we have formed a new Hive-to-Iceberg
migration process.
3.
*Easy rollback*:
-
If there is an issue in Step 1, the user simply needs to manually
delete the newly created snapshot table.
-
If there is an issue in Step 2, the user can delete the metadata
directory of the source table.
-
If there is an issue in Step 3, the user can remove the three added
properties in the Hive Metastore.
Before writing new data, users can quickly roll back to the Hive
table by removing the newly added properties in the Hive Metastore.
I have already submitted the PR to the community, which can be found here:
https://github.com/apache/iceberg/pull/14209, and have requested reviews
from the members who previously helped me with code reviews. I would
greatly appreciate it if I could get assistance from the community to help
advance this PR.
If you have any suggestions or feedback, please feel free to let me know.
Thank you for your support!
Best regards,
Shilun Fan