JingsongLi edited a comment on pull request #1318:
URL: https://github.com/apache/iceberg/pull/1318#issuecomment-671826837


   Hi @rdblue , thanks for your work, these two PRs look very good~
   
   I have two comments:
   
   ### Optimization for Upsert data
   
   Considering that upsert data will write insert file and delete file at the 
same time, this can double the storage.
   
   I'm thinking about some scenarios:
   -For example, the downstream does not need to restore CDC data stream.
   -For example, downstream engines only need PKs(equality field IDs) for 
delete records.
   
   How can we reduce storage in these scenarios? Can these additional fields be 
nulls?
   
   ### Why equality field IDs in `DeleteFile`?
   
   Why not just primary keys definition for table? Will equality field IDs be 
different between files? 
   It can be used as schema evolution? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to