n3nash commented on a change in pull request #2092:
URL: https://github.com/apache/hudi/pull/2092#discussion_r504819599



##########
File path: 
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/helpers/DFSTestSuitePathSelector.java
##########
@@ -62,19 +67,26 @@ public DFSTestSuitePathSelector(TypedProperties props, 
Configuration hadoopConf)
         lastBatchId = 0;
         nextBatchId = 1;
       }
-
-      log.info("Using DFSTestSuitePathSelector, checkpoint: " + 
lastCheckpointStr + " sourceLimit: " + sourceLimit
-          + " lastBatchId: " + lastBatchId + " nextBatchId: " + nextBatchId);
       // obtain all eligible files for the batch
       List<FileStatus> eligibleFiles = new ArrayList<>();
       FileStatus[] fileStatuses = fs.globStatus(
           new Path(props.getString(Config.ROOT_INPUT_PATH_PROP), "*"));
+      // Say input data is as follow input/1, input/2, input/5 since 3,4 was 
rolled back and 5 is new generated data
+      // checkpoint from the latest commit metadata will be 2 since 3,4 has 
been rolled back. We need to set the
+      // next batch id correctly as 5 instead of 3
+      Optional<String> correctBatchIdDueToRollback = 
Arrays.stream(fileStatuses)
+          .map(f -> 
f.getPath().toString().split("/")[f.getPath().toString().split("/").length - 1])
+          .min((bid1, bid2) -> Integer.min(Integer.parseInt(bid1), 
Integer.parseInt(bid2)));

Review comment:
       Had some uncomitted changes, pushed it now

##########
File path: 
hudi-integ-test/src/main/java/org/apache/hudi/integ/testsuite/dag/nodes/RollbackNode.java
##########
@@ -49,6 +54,11 @@ public void execute(ExecutionContext executionContext) 
throws Exception {
     Option<HoodieInstant> lastInstant = 
metaClient.getActiveTimeline().getCommitsTimeline().lastInstant();
     if (lastInstant.isPresent()) {
       log.info("Rolling back last instant {}", lastInstant.get());
+      log.info("Cleaning up generated data for the instant being rolled back 
{}", lastInstant.get());
+      
ValidationUtils.checkArgument(executionContext.getWriterContext().getProps().getOrDefault(DFSPathSelector.Config.SOURCE_INPUT_SELECTOR,
+          
DFSPathSelector.class.getName()).toString().equalsIgnoreCase(DFSTestSuitePathSelector.class.getName()),
 "Test Suite only supports DFSTestSuitePathSelector");
+      metaClient.getFs().delete(new 
Path(executionContext.getWriterContext().getCfg().inputBasePath,

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to