Hi, folks,

Currently, our company is an Iceberg user, and as part of our efforts to
scale Iceberg usage, we are working on migrating Hive tables in bulk to
Iceberg tables. After researching community solutions for in-place metadata
upgrades (Snapshot and Migrate), we have further improved these methods and
created a new stored procedure called hive_to_iceberg.

The migration process in this stored procedure consists of the following
steps:

   1.

   *Step 1*: Create a new snapshot table for the source table (using the
   Snapshot stored procedure).
   2.

   *Step 2*: Migrate the metadata of the new snapshot table back to the
   source table (using the ReWriteTablePath stored procedure).
   3.

   *Step 3*: Update multiple important properties in the Hive Metastore.
   4.

   *Step 4*: Delete the snapshot table.

This stored procedure combines the capabilities of Snapshot and
ReWriteTablePath to effectively migrate Hive tables to Iceberg tables.

*The advantages of this solution include:*

   1.

   *Simplified migration parameters*: The user only needs to provide the
   table name and parallelism parameter, with parallelism determined by the
   number of partitions.
   2.

   *Reuse of existing stored procedures*: By recombining the Snapshot and
   ReWriteTablePath stored procedures, we have formed a new Hive-to-Iceberg
   migration process.
   3.

   *Easy rollback*:
   -

      If there is an issue in Step 1, the user simply needs to manually
      delete the newly created snapshot table.
      -

      If there is an issue in Step 2, the user can delete the metadata
      directory of the source table.
      -

      If there is an issue in Step 3, the user can remove the three added
      properties in the Hive Metastore.
      Before writing new data, users can quickly roll back to the Hive
      table by removing the newly added properties in the Hive Metastore.

I have already submitted the PR to the community, which can be found here:
https://github.com/apache/iceberg/pull/14209, and have requested reviews
from the members who previously helped me with code reviews. I would
greatly appreciate it if I could get assistance from the community to help
advance this PR.

If you have any suggestions or feedback, please feel free to let me know.
Thank you for your support!

Best regards,

Shilun Fan

Reply via email to