aokolnychyi opened a new issue #1591:
URL: https://github.com/apache/iceberg/issues/1591


   One should be able to use the MIGRATE command to migrate existing tables to 
Iceberg. Similar to SNAPSHOT, it should use the existing table definition to 
create a new Iceberg table and generate metadata for existing files. Apart from 
that, it should either swap the table pointer in the original catalog or rename 
the original table to a backup table (depending on circumstances, let's 
discuss). Once the table has been migrated to Iceberg, all writes and reads 
have to be done through Iceberg. In other words, the original table should no 
longer be accessible to non-Iceberg readers.
   
   ```
   MIGRATE TABLE t [AS t2]
   USING iceberg
   [TBLPROPERTIES ('key' 'value')]
   ```
   
   In query engines like Spark where we have a notion of a custom catalog, we 
may not always be able to swap a pointer in the original catalog as the source 
and target catalogs may be different. For example, if you want to move a 
regular Spark table that stores a pointer in the HMS to the Iceberg Hadoop 
catalog. For such cases we may want to consider exposing AS target.
   
   Important to note that MIGRATE should inherit the location of the original 
table. New files must be written in the same layout. For example, Iceberg must 
set the data location as the root table location whenever migrating an existing 
dataset as opposed to having a separate data folder. Users should be prohibited 
from modifying the data location.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to