[GitHub] [iceberg] aokolnychyi opened a new issue #1590: Spark SQL Extensions: SNAPSHOT command

GitBox Mon, 12 Oct 2020 15:59:13 -0700


aokolnychyi opened a new issue #1590:
URL: https://github.com/apache/iceberg/issues/1590



   If you have an existing Spark or Hive table that complies with the standard 
Hive table format, you should be able to use the SNAPSHOT command to safely 
test out your workloads on top of Iceberg without affecting the original table. 
The SNAPSHOT command should use the definition (including the schema and 
partition spec) of the original table to create a new Iceberg table that 
contains metadata for files currently present in the original table.
   
   The SNAPSHOT command should accept an optional location for the new Iceberg 
table. In addition, the command must validate that the Iceberg table location, 
as well as the data and metadata locations, are different from the original 
Hive table location to ensure that the Hive table will be unaffected.
   
   You should be able to read and write to the created Iceberg table. New files 
will be written to the isolated location. Subsequent changes to the original 
Hive table will not be propagated to Iceberg.
   
   ```
   SNAPSHOT TABLE source AS target
   USING iceberg
   [LOCATION 'iceberg_table_location']
   [TBLPROPERTIES ('key' 'value')]
   ```
   
   Right now, SNAPSHOT can be limited to generating metadata for existing 
supported file formats (e.g. Avro, Parquet, ORC). In the future, we can also 
consider rewriting some unsupported file formats like CSV or JSON. Source 
tables can be Iceberg tables too. For example, someone may want to snapshot a 
prod table and use it for testing.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] aokolnychyi opened a new issue #1590: Spark SQL Extensions: SNAPSHOT command

Reply via email to