zratkai commented on code in PR #5529: URL: https://github.com/apache/hive/pull/5529#discussion_r1846503165
########## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergTableUtil.java: ########## @@ -458,6 +458,52 @@ public static List<DeleteFile> getDeleteFiles(Table table, String partitionPath) t -> ((PositionDeletesScanTask) t).file())); } + public static float getFileSizeRatio(Table table, String partitionPath, long fileSizeInBytesThreshold) { + long uncompactedFilesCount = getDataFileCount(table, partitionPath, fileSizeInBytesThreshold, true); + long compactedFilesCount = getDataFileCount(table, partitionPath, fileSizeInBytesThreshold, false); + + if (uncompactedFilesCount == 0) { + return 0; + } else if (compactedFilesCount == 0) { + return 1; + } else { + return uncompactedFilesCount * 1.0f / (uncompactedFilesCount + compactedFilesCount); + } + } + + private static long getDataFileCount(Table table, String partitionPath, long fileSizeInBytesThreshold, + boolean isLess) { + CloseableIterable<FileScanTask> fileScanTasks = + table.newScan().useSnapshot(table.currentSnapshot().snapshotId()).ignoreResiduals().planFiles(); + CloseableIterable<FileScanTask> filteredFileScanTasks = + CloseableIterable.filter(fileScanTasks, t -> { + DataFile file = t.asFileScanTask().file(); + return (!table.spec().isPartitioned() || + partitionPath == null && file.specId() != table.spec().specId() || + partitionPath != null && + table.specs().get(file.specId()).partitionToPath(file.partition()).equals(partitionPath)) && + (isLess ? file.fileSizeInBytes() < fileSizeInBytesThreshold : + file.fileSizeInBytes() >= fileSizeInBytesThreshold); + }); + return Lists.newArrayList(filteredFileScanTasks).size(); + } + + public static long countDeleteRecords(Table table, String partitionPath) { + Table deletesTable = + MetadataTableUtils.createMetadataTableInstance(table, MetadataTableType.POSITION_DELETES); + CloseableIterable<ScanTask> deletesScanTasks = deletesTable.newBatchScan().planFiles(); + CloseableIterable<ScanTask> filteredDeletesScanTasks = + CloseableIterable.filter(deletesScanTasks, t -> { + DeleteFile file = ((PositionDeletesScanTask) t).file(); + return !table.spec().isPartitioned() || Review Comment: It definitely confirms how important parenthesis is when there are multiple conditions in if. With or without parenthesis it could have a different result, as shown.Which means if it is done the wrong way it results in error. It also not readable for others who didn't developed it, just try to understand or modify later. Imagine someone has to add or remove a condition in the future and it leads to bugs. Please check Oracle's recommendation to this topic: https://www.oracle.com/java/technologies/javase/codeconventions-programmingpractices.html "10.5 Miscellaneous Practices 10.5.1 Parentheses It is generally a good idea to use parentheses liberally in expressions involving mixed operators to avoid operator precedence problems. Even if the operator precedence seems clear to you, it might not be to others—you shouldn’t assume that other programmers know precedence as well as you do. if (a == b && c == d) // AVOID! if ((a == b) && (c == d)) // RIGHT" Also some valuable experience from others for this topic. I can not express myself better then the comments in this : https://softwareengineering.stackexchange.com/questions/201175/should-i-fully-parenthesize-expressions-or-rely-on-precedence-rules "As for correctness, conditions without parentheses are a recipe for silly mistakes. When they happen, they can be bugs that are hard to find--because often an incorrect condition will behave correctly most of the time, and only occasionally fail." "Good developers strive to write code that is clear and correct. Parentheses in conditionals, even if they are not strictly required, help with both." "Furthermore, extra parentheses, just like indentations, whitespace, and other style standards, help visually organize the code in a logical way." "And even if you get it right, the next person to work on your code may not, either adding errors to the expression or misunderstanding your logic and thus adding errors elsewhere (as LarsH rightly points out)." -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org