yihua commented on code in PR #12056:
URL: https://github.com/apache/hudi/pull/12056#discussion_r1792332106
##########
hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/functional/TestNewHoodieParquetFileFormat.java:
##########
@@ -114,25 +117,42 @@ protected void runIndividualComparison(String
tableBasePath) {
}
protected void runIndividualComparison(String tableBasePath, String
firstColumn, String... columns) {
- Dataset<Row> legacyDf = sparkSession.read().format("hudi")
- .option(HoodieReaderConfig.FILE_GROUP_READER_ENABLED.key(), "false")
- .load(tableBasePath);
- Dataset<Row> fileFormatDf = sparkSession.read().format("hudi")
- .option(HoodieReaderConfig.FILE_GROUP_READER_ENABLED.key(), "true")
- .load(tableBasePath);
- if (firstColumn.isEmpty()) {
- //df.except(df) does not work with map type cols
- legacyDf = legacyDf.drop("city_to_state");
- fileFormatDf = fileFormatDf.drop("city_to_state");
- } else {
- if (columns.length > 0) {
- legacyDf = legacyDf.select(firstColumn, columns);
- fileFormatDf = fileFormatDf.select(firstColumn, columns);
+ List<String> queryTypes = new ArrayList<>();
Review Comment:
Do we have tests cover RO queries on bootstrapped MOR table with updates in
log files? Only changing the validation may not be enough as we need to cover
such cases particularly (might need to add preconditions to check that a MOR
table has updates in log files for such cases).
##########
hudi-spark-datasource/hudi-spark-common/src/main/scala/org/apache/hudi/HoodieFileIndex.scala:
##########
@@ -183,9 +183,9 @@ case class HoodieFileIndex(spark: SparkSession,
}).filter(slice => slice != null)
.map(fileInfo => new FileStatus(fileInfo.getLength,
fileInfo.isDirectory, 0, fileInfo.getBlockSize,
fileInfo.getModificationTime, new Path(fileInfo.getPath.toUri)))
- val c = fileSlices.filter(f => (includeLogFiles &&
f.getLogFiles.findAny().isPresent)
- || (f.getBaseFile.isPresent &&
f.getBaseFile.get().getBootstrapBaseFile.isPresent)).
- foldLeft(Map[String, FileSlice]()) { (m, f) => m + (f.getFileId ->
f) }
+ val c = fileSlices.filter(f => f.hasBootstrapBase ||
(includeLogFiles && f.hasLogFiles))
Review Comment:
Is this problem Spark only or file index in generally? This file index is
only used by Spark?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]