difin commented on code in PR #5540: URL: https://github.com/apache/hive/pull/5540#discussion_r1955012676
########## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/compaction/IcebergCompactionUtil.java: ########## @@ -94,4 +105,48 @@ public static List<DeleteFile> getDeleteFiles(Table table, String partitionPath) return Lists.newArrayList(CloseableIterable.transform(filteredDeletesScanTasks, t -> ((PositionDeletesScanTask) t).file())); } + + /** + * Returns target file size as following: + * In case of Minor compaction: + * 1. When COMPACTION_FILE_SIZE_THRESHOLD is defined, returns it. + * 2. Otherwise, calculates the file size threshold as: + * COMPACTION_FILE_SIZE_THRESHOLD * TableProperties.HIVE_ICEBERG_COMPACTION_TARGET_FILE_SIZE + * This makes Compaction evaluator consider data files with size less than file size threshold as undersized + * segment files eligible for minor compaction (as per Amoro compaction evaluator, which is minor compaction + * in Hive). + * In case of Major compaction returns -1. + * @param ci the compaction info + * @param conf Hive configuration + */ + public static long getFileSizeThreshold(CompactionInfo ci, HiveConf conf) { Review Comment: These are different things. Amoro has configs ‘self-optimizing.target-size’ and ‘self-optimizing.min-target-size-ratio’. minFileSizeBytes is calculated as ‘self-optimizing.target-size’ * ‘self-optimizing.min-target-size-ratio’. self-optimizing.target-size: 128 MB self-optimizing.fragment-ratio: ¾ minFileSizeBytes = 96 MB -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org