zhangbutao commented on code in PR #4897:
URL: https://github.com/apache/hive/pull/4897#discussion_r1403851700


##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##########
@@ -850,12 +851,44 @@ public void 
executeOperation(org.apache.hadoop.hive.ql.metadata.Table hmsTable,
         IcebergTableUtil.performMetadataDelete(icebergTable, 
deleteMetadataSpec.getBranchName(),
             deleteMetadataSpec.getSarg());
         break;
+      case DELETE_ORPHAN_FILES:

Review Comment:
   It means we regard the `Delete Orphan Files` action as a metadata operation, 
so all the delete operation is done by HS2.
    If the iceberg table has big data and many orphan files, will the HS2 
suffer from performance issue? Can we lauch tez task to delete orphan files?



##########
iceberg/iceberg-handler/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergStorageHandler.java:
##########
@@ -850,12 +851,44 @@ public void 
executeOperation(org.apache.hadoop.hive.ql.metadata.Table hmsTable,
         IcebergTableUtil.performMetadataDelete(icebergTable, 
deleteMetadataSpec.getBranchName(),
             deleteMetadataSpec.getSarg());
         break;
+      case DELETE_ORPHAN_FILES:
+        int numDeleteThreads = 
conf.getInt(HiveConf.ConfVars.HIVE_ICEBERG_EXPIRE_SNAPSHOT_NUMTHREADS.varname,
+            
HiveConf.ConfVars.HIVE_ICEBERG_EXPIRE_SNAPSHOT_NUMTHREADS.defaultIntVal);
+        AlterTableExecuteSpec.DeleteOrphanFilesDesc deleteOrphanFilesSpec =
+            (AlterTableExecuteSpec.DeleteOrphanFilesDesc) 
executeSpec.getOperationParams();
+        deleteOrphanFiles(icebergTable, 
deleteOrphanFilesSpec.getTimestampMillis(), numDeleteThreads);
+        break;
       default:
         throw new UnsupportedOperationException(
             String.format("Operation type %s is not supported", 
executeSpec.getOperationType().name()));
     }
   }
 
+  private void deleteOrphanFiles(Table icebergTable, long timestampMillis, int 
numThreads) {
+    ExecutorService deleteExecutorService = null;
+    try {
+      if (numThreads > 0) {
+        LOG.info("Executing delete orphan files on iceberg table {} with {} 
threads", icebergTable.name(), numThreads);
+        deleteExecutorService = getDeleteExecutorService(icebergTable.name(), 
numThreads);
+      }
+
+      HiveIcebergDeleteOrphanFiles deleteOrphanFiles = new 
HiveIcebergDeleteOrphanFiles(conf, icebergTable);
+      deleteOrphanFiles.olderThan(timestampMillis);
+      if (deleteExecutorService != null) {
+        deleteOrphanFiles.olderThan(timestampMillis);
+      }

Review Comment:
   Line 876 and Line 878 have a same code: 
`deleteOrphanFiles.olderThan(timestampMillis);`
   Why?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to