RussellSpitzer commented on issue #3236:
URL: https://github.com/apache/iceberg/issues/3236#issuecomment-989504872


   We are talking about "potential groups" whose contents, if re-written would 
not create a different file than the groups contents before the rewrite.
   
   So the easy example is:
   A group which contains a single file that is unaffected by deletes and which 
cannot be split into multiple files
   A group with a single file that is modified by deletes should be rewritten 
(if the rewrite deletes filter is on)
   A group with a single very large file should be rewritten (but only if we 
can split it into target size files)
   
   More complicated examples which you don't have to cover in this PR are 
things like:
   A group which contains multiple files which cannot be combined into more 
efficent files, For example we currently do not support binpacking portions of 
row groups, so if you have 3 files and each has a single 100 MB row group, we 
won't be able to rewrite that into say, 2 150 MB files since the original files 
cannot be split except on row group boundaries.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to