zhangjun0x01 opened a new issue #2162: URL: https://github.com/apache/iceberg/issues/2162
Currently, for flink users, if we want to migrate an existing hive table to an iceberg table, we may need to create an iceberg table first, and then use a SQL like ` insert into iceberg_table select * from hive_table ` to complete this work. There are two disadvantages to using this method, one is that it takes too long time, and the other is that it causes data duplication. Therefore, we need a tool to do this. I have implemented a migration action based on flink batch jobs. Currently, it can migrate hive tables with avro, parquet, orc formats to iceberg, supporting `HadoopCatalog` and `HiveCatalog`. The migration method is to keep the data in the original hive table unchanged, and then create a new iceberg table. the new iceberg table use the data file of hive, and generate corresponding metadata for the iceberg table. However, part of the test code currently depends on flink 1.12, so I will create a pr after we upgraded iceberg's flink version to 1.12 ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
