aokolnychyi commented on a change in pull request #2760:
URL: https://github.com/apache/iceberg/pull/2760#discussion_r661033048



##########
File path: core/src/main/java/org/apache/iceberg/actions/BinPackStrategy.java
##########
@@ -137,6 +139,78 @@ public RewriteStrategy options(Map<String, String> 
options) {
     ).collect(Collectors.toList());
   }
 
+  protected long targetFileSize() {
+    return this.targetFileSize;
+  }
+
+  /**
+   * Determine how many output files to create when rewriting. We use this to 
determine the split-size
+   * we want to use when actually writing files to avoid the following 
situation.
+   * <p>
+   * If we are writing 10.1 G of data with a target file size of 1G we would 
end up with
+   * 11 files, one of which would only have 0.1g. This would most likely be 
less preferable to
+   * 10 files each of which was 1.01g. So here we decide whether to round up 
or round down
+   * based on what the estimated average file size will be if we ignore the 
remainder (0.1g). If
+   * the new file size is less than 10% greater than the target file size then 
we will round down
+   * when determining the number of output files.
+   * @param totalSizeInBytes total data size for a file group
+   * @return the number of files this strategy should create
+   */
+  protected long numOutputFiles(long totalSizeInBytes) {

Review comment:
       This logic looks correct to me.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to