abhinigam opened a new issue, #13477:
URL: https://github.com/apache/iceberg/issues/13477

   ### Proposed Change
   
   An iceberg table stored on s3 consists of two folders:
   
   metadata
   data
   Moreover the metadata instead of directly pointing to the data folder points 
to the parent folder. The iceberg libraries look for data folder under this 
folder to retrieve the data files.
   
   For anyone who wants to migrate their parquet tables to iceberg tables 
in-place option becomes impossible. This complicates migration since if I have 
an existing s3 folder which contains the data files and have written spark code 
to read parquet data from there. I won't be able to do so once I move to 
iceberg since I need to look at the data sub-folder instead.
   
   This is in sharp contrast to delta lake table format where the structure of 
the delta table on s3 consists of only one folder:
   metadata - the metadata.json points directly to the folder containing the 
data which is the parent folder of metadata.
   data files are not in a sub-folder but live next to the metadata folder 
under the same folder.
   
   From in-place migration point of view delta table structure makes it much 
easier to migrate from parquet snapshot (drop+CTAS tables) instead of migrating 
to iceberg table format.
   
   ### Proposal document
   
   _No response_
   
   ### Specifications
   
   - [ ] Table
   - [ ] View
   - [ ] REST
   - [ ] Puffin
   - [ ] Encryption
   - [ ] Other


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to