zhangbutao commented on code in PR #5052:
URL: https://github.com/apache/hive/pull/5052#discussion_r1475468748


##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java:
##########
@@ -549,16 +550,27 @@ private void commitOverwrite(Table table, String 
branchName, long startTime, Fil
     if (!results.dataFiles().isEmpty()) {
       Transaction transaction = table.newTransaction();
       if (rewritePolicy == RewritePolicy.ALL_PARTITIONS) {
-        DeleteFiles delete = transaction.newDelete();
-        delete.deleteFromRowFilter(Expressions.alwaysTrue());
-        delete.commit();
-      }
-      ReplacePartitions overwrite = transaction.newReplacePartitions();
-      results.dataFiles().forEach(overwrite::addFile);
-      if (StringUtils.isNotEmpty(branchName)) {
-        overwrite.toBranch(HiveUtils.getTableSnapshotRef(branchName));
+
+        List<DataFile> existingDataFiles = Lists.newArrayList();
+        List<DeleteFile> existingDeleteFiles = Lists.newArrayList();
+        IcebergTableUtil.getFiles(table, existingDataFiles, 
existingDeleteFiles);

Review Comment:
   Got it, thx.
   
   It seems that we have tradeoff about the change.
   (twice commit) vs (single commit: loop to get and store all existing data 
and delte files)
   
   I know twice commit will generate two snapshot and maybe get inconsistency 
of data when users reading table.
   But for this change, if the original iceberg table has many many small data 
and delete files, and we need to loop and store these files in the `List`,  can 
this behavior would give memory pressure on HS?
   I am not sure which way is the best one. ;(
   
   Can other folks give some other thought? @deniskuzZ @ayushtkn @SourabhBadhya 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to