yihua commented on code in PR #11098:
URL: https://github.com/apache/hudi/pull/11098#discussion_r1597304231
##########
hudi-utilities/src/main/java/org/apache/hudi/utilities/HoodieMetadataTableValidator.java:
##########
@@ -1034,6 +1018,60 @@ private void
validateRecordIndexContent(HoodieSparkEngineContext sparkEngineCont
}
}
+ @VisibleForTesting
+ JavaPairRDD<String, Pair<String, String>>
getRecordLocationsFromFSBasedListing(HoodieSparkEngineContext
sparkEngineContext,
+
String basePath,
+
String latestCompletedCommit) {
+ return sparkEngineContext.getSqlContext().read().format("hudi")
+ .option(DataSourceReadOptions.TIME_TRAVEL_AS_OF_INSTANT().key(),
latestCompletedCommit)
+ .load(basePath)
+ .select(RECORD_KEY_METADATA_FIELD, PARTITION_PATH_METADATA_FIELD,
FILENAME_METADATA_FIELD)
+ .toJavaRDD()
+ .mapToPair(row -> new
Tuple2<>(row.getString(row.fieldIndex(RECORD_KEY_METADATA_FIELD)),
+
Pair.of(row.getString(row.fieldIndex(PARTITION_PATH_METADATA_FIELD)),
+
FSUtils.getFileId(row.getString(row.fieldIndex(FILENAME_METADATA_FIELD))))))
+ .cache();
+ }
+
+ @VisibleForTesting
+ JavaPairRDD<String, Pair<String, String>>
getRecordLocationsFromRLI(HoodieSparkEngineContext sparkEngineContext,
+ String
basePath,
+ String
latestCompletedCommit) {
+ return sparkEngineContext.getSqlContext().read().format("hudi")
+ .load(getMetadataTableBasePath(basePath))
Review Comment:
@nsivabalan one thing we can consider as a follow-up is to use the
time-travel query on MDT as well (this might not be supported but would be good
to have for the validation).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]