Re: migrating Hadoop tables to tables with hive catalog

Russell Spitzer Thu, 01 Jul 2021 06:34:02 -0700

I think you could probably also do this by just creating a Hive table and then 
changing the location to point to the most recent hadoop metadata.json file.


> On Jul 1, 2021, at 1:42 AM, Huadong Liu <huadong...@gmail.com> wrote:
> 
> FYI, I was able to do the migration by casting ManifestFile to 
> GenericManifestFile, resetting sequence number and snapshot id and adding 
> them to AppendFiles.
> 
> On Mon, Jun 28, 2021 at 3:49 PM Huadong Liu <huadong...@gmail.com 
> <mailto:huadong...@gmail.com>> wrote:
> Hi,
> 
> I am trying to migrate an Iceberg Hadoop table to a table using the hive 
> catalog. Luckily the table is appended only, so there are no delete files. It 
> is not clear which APIs were used in a previous post 
> <https://lists.apache.org/thread.html/r39f2c773bc06889cb19d7de3729d868fccbafbafcfab1922332a4dc6%40%3Cdev.iceberg.apache.org%3E>.
> 
> The list of ManifestFiles in the current snapshot can be obtained with the 
> Snapshot allManifests 
> <https://iceberg.apache.org/javadoc/0.11.1/org/apache/iceberg/Snapshot.html#allManifests-->
>  API. However, they cannot be added to the new table's AppendFiles 
> <https://iceberg.apache.org/javadoc/0.11.1/org/apache/iceberg/AppendFiles.html>
>  for committing because the snapshot id needs to be blank 
> <https://github.com/apache/iceberg/blob/master/core/src/main/java/org/apache/iceberg/MergeAppend.java#L55>.
> 
> Alternatively, the table snapshots 
> <https://iceberg.apache.org/javadoc/0.11.1/org/apache/iceberg/Table.html#snapshots-->
>  API can be used to get all snapshots of the table. From there, data files 
> for each snapshot can be obtained with addedFiles 
> <https://iceberg.apache.org/javadoc/0.11.1/org/apache/iceberg/Snapshot.html#addedFiles-->
>  API and then added to AppendFiles of the new table with hive catalog.
> 
> I am not sure the latter is correct for the migration. Any input is 
> appreciated.
> 
> --
> Huadong

Re: migrating Hadoop tables to tables with hive catalog

Reply via email to