zhangjun0x01 commented on pull request #2217:
URL: https://github.com/apache/iceberg/pull/2217#issuecomment-780382563


   > I have just found this PR when I was wondering how to migrate existing 
Hive tables to Iceberg tables.
   > My use-case is that I have an existing Hive table, and I would like to 
convert it to a Hive table backed by an Iceberg table in-place, and without 
moving the actual data. I would like to create the corresponding manifest files 
and the first snapshot using the existing files.
   > 
   > When I sketched the code I found that I was listing the partitions / 
creating `DataFile` objects / creating a new Iceberg table and adding the data 
files to it. And then I have found that you have already did the same in the 
`Actions.migrateHive2Iceberg`. At first glance there is not too much Flink 
specific code in this change. Would it be hard to create a general tool for the 
migration instead of a Flink specific action?
   > 
   > Thanks,
   > Peter
   
   We can make a tool to do this migration work, just like the snapshot 
expired, iceberg provides Java api and a spark action, but for some large hive 
tables, if it is only done through the Java api, it may be very slow, using  
the engine (spark or flink) can increase the speed of migration. If we only 
migrate a small hive table, we can run the flink program on our own machine, 
just like a test case.  What do you think?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to