RussellSpitzer commented on issue #3236: URL: https://github.com/apache/iceberg/issues/3236#issuecomment-989504872
We are talking about "potential groups" whose contents, if re-written would not create a different file than the groups contents before the rewrite. So the easy example is: A group which contains a single file that is unaffected by deletes and which cannot be split into multiple files A group with a single file that is modified by deletes should be rewritten (if the rewrite deletes filter is on) A group with a single very large file should be rewritten (but only if we can split it into target size files) More complicated examples which you don't have to cover in this PR are things like: A group which contains multiple files which cannot be combined into more efficent files, For example we currently do not support binpacking portions of row groups, so if you have 3 files and each has a single 100 MB row group, we won't be able to rewrite that into say, 2 150 MB files since the original files cannot be split except on row group boundaries. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
