yihua commented on code in PR #11098:
URL: https://github.com/apache/hudi/pull/11098#discussion_r1597304231


##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieMetadataTableValidator.java:
##########
@@ -1034,6 +1018,60 @@ private void 
validateRecordIndexContent(HoodieSparkEngineContext sparkEngineCont
     }
   }
 
+  @VisibleForTesting
+  JavaPairRDD<String, Pair<String, String>> 
getRecordLocationsFromFSBasedListing(HoodieSparkEngineContext 
sparkEngineContext,
+                                                                               
                       String basePath,
+                                                                               
                       String latestCompletedCommit) {
+    return sparkEngineContext.getSqlContext().read().format("hudi")
+        .option(DataSourceReadOptions.TIME_TRAVEL_AS_OF_INSTANT().key(), 
latestCompletedCommit)
+        .load(basePath)
+        .select(RECORD_KEY_METADATA_FIELD, PARTITION_PATH_METADATA_FIELD, 
FILENAME_METADATA_FIELD)
+        .toJavaRDD()
+        .mapToPair(row -> new 
Tuple2<>(row.getString(row.fieldIndex(RECORD_KEY_METADATA_FIELD)),
+            
Pair.of(row.getString(row.fieldIndex(PARTITION_PATH_METADATA_FIELD)),
+                
FSUtils.getFileId(row.getString(row.fieldIndex(FILENAME_METADATA_FIELD))))))
+        .cache();
+  }
+
+  @VisibleForTesting
+  JavaPairRDD<String, Pair<String, String>> 
getRecordLocationsFromRLI(HoodieSparkEngineContext sparkEngineContext,
+                                                                      String 
basePath,
+                                                                      String 
latestCompletedCommit) {
+    return sparkEngineContext.getSqlContext().read().format("hudi")
+        .load(getMetadataTableBasePath(basePath))

Review Comment:
   @nsivabalan one thing we can consider as a follow-up is to use the 
time-travel query on MDT as well (this might not be supported but would be good 
to have for the validation).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to