SourabhBadhya commented on code in PR #5251: URL: https://github.com/apache/hive/pull/5251#discussion_r1629517880
########## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java: ########## @@ -136,6 +145,7 @@ public void commitTask(TaskAttemptContext originalContext) throws IOException { ExecutorService tableExecutor = tableExecutor(jobConf, outputs.size()); try { // Generates commit files for the target tables in parallel + Collection<Path> finalMergedPaths = new ConcurrentLinkedQueue<>(mergedPaths); Review Comment: 1. Modified it to a list. Done. 2. This is done to ensure the information about input files used for merge is retained. This is null in most cases except for merge. ########## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java: ########## @@ -459,9 +474,14 @@ private void commitTable(FileIO io, ExecutorService executor, OutputTable output deleteFiles.addAll(writeResults.deleteFiles()); replacedDataFiles.addAll(writeResults.replacedDataFiles()); referencedDataFiles.addAll(writeResults.referencedDataFiles()); + mergedAndDeletedFiles.addAll(writeResults.mergedAndDeletedFiles()); } - FilesForCommit filesForCommit = new FilesForCommit(dataFiles, deleteFiles, replacedDataFiles, referencedDataFiles); + dataFiles.removeIf(dataFile -> mergedAndDeletedFiles.contains(new Path(String.valueOf(dataFile.path())))); Review Comment: While writing data, there are multiple `jobContexts`. The files from one jobContext can act as the input files for merge task which are written in another jobContexts. Hence to resolve them, its done during commit phase. ########## iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/IcebergAcidUtil.java: ########## @@ -203,7 +204,7 @@ public static long parseFilePosition(Record rec) { return rec.get(FILE_READ_META_COLS.get(MetadataColumns.ROW_POSITION), Long.class); } - public static long computeHash(StructProjection struct) { + public static long computeHash(StructLike struct) { Review Comment: Removed. Done. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For additional commands, e-mail: gitbox-h...@hive.apache.org