aokolnychyi opened a new issue #1590:
URL: https://github.com/apache/iceberg/issues/1590
If you have an existing Spark or Hive table that complies with the standard
Hive table format, you should be able to use the SNAPSHOT command to safely
test out your workloads on top of Iceberg without affecting the original table.
The SNAPSHOT command should use the definition (including the schema and
partition spec) of the original table to create a new Iceberg table that
contains metadata for files currently present in the original table.
The SNAPSHOT command should accept an optional location for the new Iceberg
table. In addition, the command must validate that the Iceberg table location,
as well as the data and metadata locations, are different from the original
Hive table location to ensure that the Hive table will be unaffected.
You should be able to read and write to the created Iceberg table. New files
will be written to the isolated location. Subsequent changes to the original
Hive table will not be propagated to Iceberg.
```
SNAPSHOT TABLE source AS target
USING iceberg
[LOCATION 'iceberg_table_location']
[TBLPROPERTIES ('key' 'value')]
```
Right now, SNAPSHOT can be limited to generating metadata for existing
supported file formats (e.g. Avro, Parquet, ORC). In the future, we can also
consider rewriting some unsupported file formats like CSV or JSON. Source
tables can be Iceberg tables too. For example, someone may want to snapshot a
prod table and use it for testing.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]