Stephen-Robin edited a comment on issue #2195: URL: https://github.com/apache/iceberg/issues/2195#issuecomment-771725145
> I'm not sure how the above line would cause data loss, it looks like it just there to determine which files to be re-written. You are correct it will not split files, but I'm not sure why they would be dropped. @RussellSpitzer Thanks for your reply. When we have a 'large file' that size is larger than 'targetSizeInBytes'. Some part of this 'large file' that occupy whole bin pack will not be rewrite. But the whole 'large file' will be deleted because it is in the currentDataFiles. Using the example mentioned in the issue, file part A (10M) is not in currentFile, file part B (1M) is in currentFile, but file A (10M) is not in addedFile.But after commit, the manifest status corresponding to the initial 11M file will be set to deleted ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
