[GitHub] [iceberg] rymurr commented on issue #2068: Procedure for adding files to a Table

GitBox Wed, 13 Jan 2021 05:59:54 -0800


rymurr commented on issue #2068:
URL: https://github.com/apache/iceberg/issues/2068#issuecomment-759466345



   I agree with @electrum that a huge strength of iceberg is its strong 
specification, I would be reluctant to do anything that could weaken that or 
give users a footgun wrt to metadata strength. Likewise I agree that reading 
the files is probably required to do this safely.
   
   However, we are currently working on our own version of the 'make these 
files an iceberg table' function. It sounds like several of these already exist 
in other places. From the comments this is being driven by the desire to avoid 
copying files around, especially on S3. Our use case is the same and the 
conversion to iceberg will be done primarily by users (as opposed to a 
super-user). 
   
   I would be interested in helping to derive/implement a spec that defines the 
canonical Iceberg approach to importing a set of files without 
moving/copying/renaming (I guess this is similar to the MIGRATE Spark action?). 
At the very least this is likely going to have to read the footers and to 
verify the schema and partition spec. I see a lot of value in this function 
working (at least partially) across engines so that all users/systems can take 
advantage of it.
   
   cc @vvellanki


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] rymurr commented on issue #2068: Procedure for adding files to a Table

Reply via email to