singhpk234 commented on issue #3236: URL: https://github.com/apache/iceberg/issues/3236#issuecomment-986726809
Thanks @RussellSpitzer . Can you please help me with some clarification's :) or can point me to some resources : when it's said groups which have single file you mean to say the packed groups ? https://github.com/apache/iceberg/blob/adc6d153b26e7f982d75b427f62002e32f1913fc/core/src/main/java/org/apache/iceberg/actions/BinPackStrategy.java#L137-L139 Implying an additional condition on filtering the potentialGroups . ``` (group.size() >= minInputFiles && group.size() != 1) || sizeOfInputFiles(group) > targetFileSize ``` or you mean skip all groups when `minInputFiles = 1` . As per my understanding, this story, attemps to save op's when we have one file within our targetSize we don't need to re-write it and thus can make this a no-op. But if there is a file whose size is > targetSize it make sense for it to be re-written into optimal size files and hence the skipping all groups when this config is set to 1 is incorrect and thus we should intend to modify the filter condition in the potentialGroups filtering. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
