[GitHub] [iceberg] JingsongLi commented on pull request #1318: Add equality field IDs to DeleteFile

GitBox Tue, 11 Aug 2020 02:07:09 -0700


JingsongLi commented on pull request #1318:
URL: https://github.com/apache/iceberg/pull/1318#issuecomment-671826837



   Hi @rdblue , thanks for your work, these two PRs look very good~
   
   I have two comments:
   
   ## Optimization for Upsert data
   
   Considering that upsert data will write insert file and delete file at the 
same time,  this can double the storage.
   
   I'm thinking about some scenarios:
   -For example, the downstream does not need to restore CDC data stream.
   -For example, downstream engines only need PKs(equality field IDs) for 
delete records.
   
   How can we reduce storage in these scenarios? Can these additional fields be 
nulls?
   
   ## Why equality field IDs in `DeleteFile`?
   
   Why not just primary keys definition for table? Will equality field IDs be 
different between files? 
   It can be used as schema evolution? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] JingsongLi commented on pull request #1318: Add equality field IDs to DeleteFile

Reply via email to