[GitHub] [hudi] guanziyue commented on a change in pull request #4446: [HUDI-2917] rollback insert data appended to log file when using Hbase Index

GitBox Sun, 09 Jan 2022 19:48:17 -0800


guanziyue commented on a change in pull request #4446:
URL: https://github.com/apache/hudi/pull/4446#discussion_r780880230




##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/HoodieCompactor.java
##########
@@ -182,14 +182,28 @@ public abstract void preCompact(
         .withOperationField(config.allowOperationMetadataField())
         .withPartition(operation.getPartitionPath())
         .build();
-    if (!scanner.iterator().hasNext()) {
-      scanner.close();
-      return new ArrayList<>();
-    }
 
     Option<HoodieBaseFile> oldDataFileOpt =
         operation.getBaseFile(metaClient.getBasePath(), 
operation.getPartitionPath());
 
+    // Considering following scenario: if all log blocks in this fileSlice is 
rollback, it returns an empty scanner.
+    // But in this case, we need to give it a base file. Otherwise, it will 
lose base file in following fileSlice.
+    if (!scanner.iterator().hasNext()) {
+      if (!oldDataFileOpt.isPresent()) {
+        scanner.close();
+        return new ArrayList<>();
+      } else {
+        // TODO: we may directly rename original parquet file if there is not 
evolution/devolution of schema

Review comment:
       > If the file slice only has parquet files, why we still trigger 
compaction ?
   
   Before we actually do compaction, it is quite difficult to know that new 
fileSlice only has parquet file. There do have one or more log Files exists 
which have no valid log blocks in them. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] guanziyue commented on a change in pull request #4446: [HUDI-2917] rollback insert data appended to log file when using Hbase Index

Reply via email to