zhangbutao commented on code in PR #5052:
URL: https://github.com/apache/hive/pull/5052#discussion_r1475468748
##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java:
##########
@@ -549,16 +550,27 @@ private void commitOverwrite(Table table, String
branchName, long startTime, Fil
if (!results.dataFiles().isEmpty()) {
Transaction transaction = table.newTransaction();
if (rewritePolicy == RewritePolicy.ALL_PARTITIONS) {
- DeleteFiles delete = transaction.newDelete();
- delete.deleteFromRowFilter(Expressions.alwaysTrue());
- delete.commit();
- }
- ReplacePartitions overwrite = transaction.newReplacePartitions();
- results.dataFiles().forEach(overwrite::addFile);
- if (StringUtils.isNotEmpty(branchName)) {
- overwrite.toBranch(HiveUtils.getTableSnapshotRef(branchName));
+
+ List<DataFile> existingDataFiles = Lists.newArrayList();
+ List<DeleteFile> existingDeleteFiles = Lists.newArrayList();
+ IcebergTableUtil.getFiles(table, existingDataFiles,
existingDeleteFiles);
Review Comment:
Got it, thx.
It seems that we have tradeoff about the change.
(twice commit) vs (single commit: loop to get and store all existing data
and delte files)
I know twice commit will generate two snapshot and maybe get inconsistency
of data when users reading table.
But for this change, if the original iceberg table has many many small data
and delete files, and we need to loop and store these files in the `List`, can
this behavior would give memory pressure on HS?
I am not sure which way is the best one. ;(
Can other folks give some other thought? @deniskuzZ @ayushtkn @SourabhBadhya
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]