aokolnychyi opened a new issue #1591:
URL: https://github.com/apache/iceberg/issues/1591
One should be able to use the MIGRATE command to migrate existing tables to
Iceberg. Similar to SNAPSHOT, it should use the existing table definition to
create a new Iceberg table and generate metadata for existing files. Apart from
that, it should either swap the table pointer in the original catalog or rename
the original table to a backup table (depending on circumstances, let's
discuss). Once the table has been migrated to Iceberg, all writes and reads
have to be done through Iceberg. In other words, the original table should no
longer be accessible to non-Iceberg readers.
```
MIGRATE TABLE t [AS t2]
USING iceberg
[TBLPROPERTIES ('key' 'value')]
```
In query engines like Spark where we have a notion of a custom catalog, we
may not always be able to swap a pointer in the original catalog as the source
and target catalogs may be different. For example, if you want to move a
regular Spark table that stores a pointer in the HMS to the Iceberg Hadoop
catalog. For such cases we may want to consider exposing AS target.
Important to note that MIGRATE should inherit the location of the original
table. New files must be written in the same layout. For example, Iceberg must
set the data location as the root table location whenever migrating an existing
dataset as opposed to having a separate data folder. Users should be prohibited
from modifying the data location.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]