Stephen-Robin commented on issue #2195:
URL: https://github.com/apache/iceberg/issues/2195#issuecomment-771725145


   > I'm not sure how the above line would cause data loss, it looks like it 
just there to determine which files to be re-written. You are correct it will 
not split files, but I'm not sure why they would be dropped.
   
   Thanks for your reply.
   When we have a 'large file' that size is larger than 'targetSizeInBytes'. 
Some part of this 'large file' that occupy whole bin pack will not be rewrite. 
But the whole 'large file' will be deleted because it is in the 
currentDataFiles.
   
   Using the example mentioned in the issue, file part A (10M) is not in 
currentFile,  file part B (1M) is in currentFile, but file A (10M) is not in 
addedFile.But after commit, the manifest status corresponding to the initial 
11M file will be set to deleted
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to