[GitHub] [hudi] nsivabalan commented on a change in pull request #4446: [HUDI-2917] rollback insert data appended to log file when using Hbase Index

GitBox Tue, 08 Feb 2022 17:29:00 -0800


nsivabalan commented on a change in pull request #4446:
URL: https://github.com/apache/hudi/pull/4446#discussion_r802192950




##########
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java
##########
@@ -70,9 +71,13 @@
    */
   private int totalBuckets = 0;
   /**
-   * Stat for the current workload. Helps in determining inserts, upserts etc.
+   * Stat for the input workload. Describe the workload before being assigned 
buckets.
    */
-  private WorkloadProfile profile;
+  private WorkloadProfile inputProfile;
+  /**
+   * Stat for the execution workload. Describe the workload after being 
assigned buckets.
+   */
+  private HashMap<String, WorkloadStat> executionProfile = new HashMap<>();

Review comment:
       as suggested above, can we try to move this within WorkloadProfile only. 

##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/HoodieCompactor.java
##########
@@ -182,14 +182,28 @@ public abstract void preCompact(
         .withOperationField(config.allowOperationMetadataField())
         .withPartition(operation.getPartitionPath())
         .build();
-    if (!scanner.iterator().hasNext()) {
-      scanner.close();
-      return new ArrayList<>();
-    }
 
     Option<HoodieBaseFile> oldDataFileOpt =
         operation.getBaseFile(metaClient.getBasePath(), 
operation.getPartitionPath());
 
+    // Considering following scenario: if all log blocks in this fileSlice is 
rollback, it returns an empty scanner.
+    // But in this case, we need to give it a base file. Otherwise, it will 
lose base file in following fileSlice.
+    if (!scanner.iterator().hasNext()) {
+      if (!oldDataFileOpt.isPresent()) {
+        scanner.close();
+        return new ArrayList<>();
+      } else {
+        // TODO: we may directly rename original parquet file if there is not 
evolution/devolution of schema

Review comment:
       yes. it's not feasible to know upfront if all log blocks are invalid 
while planning compaction. 

##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/Partitioner.java
##########
@@ -18,10 +18,14 @@
 
 package org.apache.hudi.table.action.commit;
 
+import org.apache.hudi.table.WorkloadProfile;
+
 import java.io.Serializable;
 
 public interface Partitioner extends Serializable {
   int getNumPartitions();
 
   int getPartition(Object key);
+
+  WorkloadProfile getExecutionWorkloadProfile();
 }

Review comment:
       wondering if we can augment existing WorkloadProfile only to add another 
instance of HashMap<String, WorkloadStat> partitionPathStatMap.
   
   ```
     /**
      * Input workload profile.
      */
     protected final HashMap<String, WorkloadStat> inputPartitionPathStatMap;
     /**
      * Execution workload profile.
      */
     protected final HashMap<String, WorkloadStat> outputPartitionPathStatMap;
   ```
   
   Let the UpsertPartitioner update the execution profile (a.k.a 
outputPartitionPathStatMap ) in the passed in workload profile itself. 
   And so CommitActionExecutors does not need to poll partitionor for execution 
profile since they already have an handle to WorkloadProfile.
   
   
   

##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/HoodieCompactor.java
##########
@@ -182,14 +182,28 @@ public abstract void preCompact(
         .withOperationField(config.allowOperationMetadataField())
         .withPartition(operation.getPartitionPath())
         .build();
-    if (!scanner.iterator().hasNext()) {
-      scanner.close();
-      return new ArrayList<>();
-    }
 
     Option<HoodieBaseFile> oldDataFileOpt =
         operation.getBaseFile(metaClient.getBasePath(), 
operation.getPartitionPath());
 
+    // Considering following scenario: if all log blocks in this fileSlice is 
rollback, it returns an empty scanner.
+    // But in this case, we need to give it a base file. Otherwise, it will 
lose base file in following fileSlice.
+    if (!scanner.iterator().hasNext()) {
+      if (!oldDataFileOpt.isPresent()) {
+        scanner.close();
+        return new ArrayList<>();
+      } else {
+        // TODO: we may directly rename original parquet file if there is not 
evolution/devolution of schema

Review comment:
       yes, your understanding is right. 
   what i meant by "do these need to be cleaned up " is, why have commented out 
code from L197 to L203? do you want to remove them. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] nsivabalan commented on a change in pull request #4446: [HUDI-2917] rollback insert data appended to log file when using Hbase Index

Reply via email to