RussellSpitzer opened a new issue #3236:
URL: https://github.com/apache/iceberg/issues/3236


   Currently we allow MIN_INPUT_FILES to be any positive number but if we set 
the value to 1 this means single files will be rewritten. 
   
   
https://github.com/apache/iceberg/blob/adc6d153b26e7f982d75b427f62002e32f1913fc/core/src/main/java/org/apache/iceberg/actions/BinPackStrategy.java#L234-L236
   
   There are two circumstances in my mind when this value is set to 1
   
   1. There is a single file which is too small. We should ignore this file 
since we can't actually do anything with it
   2. There is a single file which is too big, this should actually be 
rewritten as files of the correct size.
   
   Currently we would rewrite in both cases, I think we should put in an extra 
check to always turn 1. into a NOOP


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to