[GitHub] [iceberg] zhangjun0x01 opened a new issue #2162: Flink : migrate existing hive table to iceberg table

GitBox Wed, 27 Jan 2021 00:32:30 -0800


zhangjun0x01 opened a new issue #2162:
URL: https://github.com/apache/iceberg/issues/2162



   Currently, for flink users, if we want to migrate an existing hive table to 
an iceberg table, we may need to create an iceberg table first, and then use a  
SQL  like ` insert into iceberg_table select * from hive_table ` to complete 
this work. There are two disadvantages to using this method, one is that it 
takes too long time, and the other is that it causes data duplication. 
Therefore, we need a tool to do this.
   
   I have implemented a migration action based on flink batch jobs. Currently, 
it can migrate hive tables with avro, parquet, orc formats to iceberg, 
supporting `HadoopCatalog` and `HiveCatalog`.
   
   The migration method is to keep the data in the original hive table 
unchanged, and then create a new iceberg table. the new iceberg table use the 
data file of hive, and generate corresponding metadata for the iceberg table. 
   
   However, part of the test code  currently depends on flink 1.12, so I will 
create a pr after we upgraded  iceberg's flink version to 1.12


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] zhangjun0x01 opened a new issue #2162: Flink : migrate existing hive table to iceberg table

Reply via email to