[GitHub] [hudi] codope commented on a change in pull request #4840: [HUDI-2917] rollback insert data appended to log file when using Hbase Index

GitBox Mon, 21 Feb 2022 02:59:47 -0800


codope commented on a change in pull request #4840:
URL: https://github.com/apache/hudi/pull/4840#discussion_r810993836




##########
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/action/rollback/TestMergeOnReadRollbackActionExecutor.java
##########
@@ -169,4 +318,13 @@ public void testRollbackWhenFirstCommitFail() throws 
Exception {
       client.rollback("001");
     }
   }
+
+  private void setUpDFS() throws IOException {
+    initDFS();
+    initSparkContexts();
+    //just generate tow partitions

Review comment:
       nit: `two` spell-check.

##########
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java
##########
@@ -156,19 +156,21 @@ private void initKeyGenIfNeeded(boolean 
populateMetaFields) {
       LOG.info("RDD PreppedRecords was persisted at: " + 
inputRecordsRDD.getStorageLevel());
     }
 
-    WorkloadProfile profile = null;
+    WorkloadProfile workloadProfile = null;
     if (isWorkloadProfileNeeded()) {
       context.setJobStatus(this.getClass().getSimpleName(), "Building workload 
profile");
-      profile = new WorkloadProfile(buildProfile(inputRecordsRDD), 
operationType);
-      LOG.info("Workload profile :" + profile);
-      saveWorkloadProfileMetadataToInflight(profile, instantTime);
+      workloadProfile = new WorkloadProfile(buildProfile(inputRecordsRDD), 
operationType);
+      LOG.info("Input workload profile :" + workloadProfile);
     }
 
     // handle records update with clustering
     JavaRDD<HoodieRecord<T>> inputRecordsRDDWithClusteringUpdate = 
clusteringHandleUpdate(inputRecordsRDD);
 
     // partition using the insert partitioner
-    final Partitioner partitioner = getPartitioner(profile);
+    final Partitioner partitioner = getPartitioner(workloadProfile);
+    if (isWorkloadProfileNeeded()) {
+      saveWorkloadProfileMetadataToInflight(workloadProfile, instantTime);

Review comment:
       Just for my understanding, why do we want to save workload profile after 
handling clustering updates? Suppose `clusteringHandleUpdate` threw exception 
then the profile will never be saved in metadata right?

##########
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/UpsertPartitioner.java
##########
@@ -235,6 +243,7 @@ private void assignInserts(WorkloadProfile profile, 
HoodieEngineContext context)
         LOG.info("Total insert buckets for partition path " + partitionPath + 
" => " + insertBuckets);
         partitionPathToInsertBucketInfos.put(partitionPath, insertBuckets);
       }
+      profile.getOutputPartitionPathStatMap().putIfAbsent(partitionPath, 
outputWorkloadStats);

Review comment:
       Why `putIfAbsent`? Should we not update the stat in this case?

##########
File path: 
hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/commit/BaseSparkCommitActionExecutor.java
##########
@@ -156,19 +156,21 @@ private void initKeyGenIfNeeded(boolean 
populateMetaFields) {
       LOG.info("RDD PreppedRecords was persisted at: " + 
inputRecordsRDD.getStorageLevel());
     }
 
-    WorkloadProfile profile = null;
+    WorkloadProfile workloadProfile = null;
     if (isWorkloadProfileNeeded()) {
       context.setJobStatus(this.getClass().getSimpleName(), "Building workload 
profile");
-      profile = new WorkloadProfile(buildProfile(inputRecordsRDD), 
operationType);
-      LOG.info("Workload profile :" + profile);
-      saveWorkloadProfileMetadataToInflight(profile, instantTime);
+      workloadProfile = new WorkloadProfile(buildProfile(inputRecordsRDD), 
operationType);
+      LOG.info("Input workload profile :" + workloadProfile);
     }
 
     // handle records update with clustering
     JavaRDD<HoodieRecord<T>> inputRecordsRDDWithClusteringUpdate = 
clusteringHandleUpdate(inputRecordsRDD);
 
     // partition using the insert partitioner
-    final Partitioner partitioner = getPartitioner(profile);
+    final Partitioner partitioner = getPartitioner(workloadProfile);
+    if (isWorkloadProfileNeeded()) {

Review comment:
       Alternatively, the condition can be changed to `if (workloadProfile != 
null)` to make it more readbale imo. But, i'll leave it upto you. 

##########
File path: 
hudi-client/hudi-spark-client/src/test/java/org/apache/hudi/table/action/rollback/TestMergeOnReadRollbackActionExecutor.java
##########
@@ -139,6 +157,137 @@ public void testMergeOnReadRollbackActionExecutor(boolean 
isUsingMarkers) throws
     assertFalse(WriteMarkersFactory.get(cfg.getMarkersType(), table, 
"002").doesMarkerDirExist());
   }
 
+  @Test
+  public void testRollbackForCanIndexLogFile() throws IOException {
+    cleanupResources();
+    setUpDFS();
+    //1. prepare data and assert data result
+    //just generate one partitions
+    dataGen = new HoodieTestDataGenerator(new 
String[]{DEFAULT_FIRST_PARTITION_PATH});
+    HoodieWriteConfig cfg = 
HoodieWriteConfig.newBuilder().withPath(basePath).withSchema(HoodieTestDataGenerator.TRIP_EXAMPLE_SCHEMA)
+        .withParallelism(2, 
2).withBulkInsertParallelism(2).withFinalizeWriteParallelism(2).withDeleteParallelism(2)
+        .withTimelineLayoutVersion(TimelineLayoutVersion.CURR_VERSION)
+        .withWriteStatusClass(MetadataMergeWriteStatus.class)
+        
.withConsistencyGuardConfig(ConsistencyGuardConfig.newBuilder().withConsistencyCheckEnabled(true).build())
+        
.withCompactionConfig(HoodieCompactionConfig.newBuilder().compactionSmallFileSize(1024
 * 1024).build())
+        
.withStorageConfig(HoodieStorageConfig.newBuilder().hfileMaxFileSize(1024 * 
1024).parquetMaxFileSize(1024 * 1024).build())
+        .forTable("test-trip-table")
+        
.withIndexConfig(HoodieIndexConfig.newBuilder().withIndexType(HoodieIndex.IndexType.INMEMORY).build())
+        
.withEmbeddedTimelineServerEnabled(true).withFileSystemViewConfig(FileSystemViewStorageConfig.newBuilder()
+            .withEnableBackupForRemoteFileSystemView(false) // Fail test if 
problem connecting to timeline-server
+            
.withStorageType(FileSystemViewStorageType.EMBEDDED_KV_STORE).build()).withRollbackUsingMarkers(false).withAutoCommit(false).build();
+
+    //1. prepare data
+    HoodieTestDataGenerator.writePartitionMetadata(fs, new 
String[]{DEFAULT_FIRST_PARTITION_PATH}, basePath);
+    SparkRDDWriteClient client = getHoodieWriteClient(cfg);
+    /**

Review comment:
       maybe we can remove this multi-line comment here and below?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] codope commented on a change in pull request #4840: [HUDI-2917] rollback insert data appended to log file when using Hbase Index

Reply via email to