difin commented on code in PR #5540:
URL: https://github.com/apache/hive/pull/5540#discussion_r1955012676


##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/compaction/IcebergCompactionUtil.java:
##########
@@ -94,4 +105,48 @@ public static List<DeleteFile> getDeleteFiles(Table table, 
String partitionPath)
     return 
Lists.newArrayList(CloseableIterable.transform(filteredDeletesScanTasks,
         t -> ((PositionDeletesScanTask) t).file()));
   }
+
+  /**
+   * Returns target file size as following:
+   * In case of Minor compaction:
+   *  1. When COMPACTION_FILE_SIZE_THRESHOLD is defined, returns it.
+   *  2. Otherwise, calculates the file size threshold as:
+   *       COMPACTION_FILE_SIZE_THRESHOLD * 
TableProperties.HIVE_ICEBERG_COMPACTION_TARGET_FILE_SIZE
+   *     This makes Compaction evaluator consider data files with size less 
than file size threshold as undersized
+   *     segment files eligible for minor compaction (as per Amoro compaction 
evaluator, which is minor compaction
+   *     in Hive).
+   * In case of Major compaction returns -1.
+   * @param ci the compaction info
+   * @param conf Hive configuration
+   */
+  public static long getFileSizeThreshold(CompactionInfo ci, HiveConf conf) {

Review Comment:
   These are different things.
   
   Amoro has configs ‘self-optimizing.target-size’ and 
‘self-optimizing.min-target-size-ratio’.
   minFileSizeBytes is calculated as ‘self-optimizing.target-size’ * 
‘self-optimizing.min-target-size-ratio’.
   
   self-optimizing.target-size: 128 MB
   self-optimizing.fragment-ratio: ⅛ 
   minFileSizeBytes = 16 MB
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org
For additional commands, e-mail: gitbox-h...@hive.apache.org

Reply via email to