lokeshj1703 commented on code in PR #13591:
URL: https://github.com/apache/hudi/pull/13591#discussion_r2228545794
##########
hudi-cli/src/main/java/org/apache/hudi/cli/commands/HoodieLogFileCommand.java:
##########
@@ -240,15 +241,17 @@ storage, new StoragePath(logFilePathPattern)).stream()
HoodieFileGroupId fileGroupId = new
HoodieFileGroupId(FSUtils.getRelativePartitionPath(HoodieCLI.getTableMetaClient().getBasePath(),
firstLogFile), FSUtils.getFileIdFromLogPath(firstLogFile));
FileSlice fileSlice = new FileSlice(fileGroupId,
HoodieTimeline.INIT_INSTANT_TS, null, logFilePaths.stream()
.map(l -> new HoodieLogFile(new
StoragePath(l))).collect(Collectors.toList()));
+ TypedProperties fileGroupReaderProperties =
buildFileGroupReaderProperties();
try (HoodieFileGroupReader<IndexedRecord> fileGroupReader =
HoodieFileGroupReader.<IndexedRecord>newBuilder()
.withReaderContext(readerContext)
.withHoodieTableMetaClient(HoodieCLI.getTableMetaClient())
.withFileSlice(fileSlice)
.withDataSchema(readerSchema)
.withRequestedSchema(readerSchema)
.withLatestCommitTime(client.getActiveTimeline().getCommitAndReplaceTimeline().lastInstant().map(HoodieInstant::requestedTime).orElse(HoodieInstantTimeGenerator.getCurrentInstantTimeStr()))
- .withProps(buildFileGroupReaderProperties())
+ .withProps(fileGroupReaderProperties)
.withShouldUseRecordPosition(false)
+
.withEnableOptimizedLogBlockScan(getEnableOptimizedLogBlocksScan(fileGroupReaderProperties))
Review Comment:
I think we are always setting this to default value. Should we use the same
here and remove the other changes?
`ENABLE_OPTIMIZED_LOG_BLOCKS_SCAN.defaultValue()`
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/io/FileGroupReaderBasedAppendHandle.java:
##########
@@ -85,8 +85,10 @@ public void doAppend() {
// Initializes the record iterator, log compaction requires writing the
deletes into the delete block of the resulting log file.
try (HoodieFileGroupReader<T> fileGroupReader =
HoodieFileGroupReader.<T>newBuilder().withReaderContext(readerContext).withHoodieTableMetaClient(hoodieTable.getMetaClient())
.withLatestCommitTime(instantTime).withPartitionPath(partitionPath).withLogFiles(logFiles).withBaseFileOption(Option.empty()).withDataSchema(writeSchemaWithMetaFields)
-
.withRequestedSchema(writeSchemaWithMetaFields).withEnableOptimizedLogBlockScan(true).withInternalSchema(internalSchemaOption).withProps(props).withEmitDelete(true)
-
.withShouldUseRecordPosition(usePosition).withSortOutput(hoodieTable.requireSortedRecords()).build())
{
+
.withRequestedSchema(writeSchemaWithMetaFields).withInternalSchema(internalSchemaOption).withProps(props).withEmitDelete(true)
+
.withShouldUseRecordPosition(usePosition).withSortOutput(hoodieTable.requireSortedRecords())
+ // instead of using config.enableOptimizedLogBlocksScan(), we set to
true as log compaction blocks only supported in scanV2
+ .withEnableOptimizedLogBlockScan(true).build()) {
Review Comment:
I did not get why we are setting it to true here. In most of the places it
is being set using a config.
##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java:
##########
@@ -895,7 +895,7 @@ public static <T> HoodieData<HoodieRecord>
convertMetadataToRecordIndexRecords(H
.collect(Collectors.toList());
// Extract revived and deleted keys
Pair<Set<String>, Set<String>> revivedAndDeletedKeys =
getRevivedAndDeletedKeysFromMergedLogs(dataTableMetaClient, instantTime,
allLogFilePaths, finalWriterSchemaOpt,
- currentLogFilePaths, partitionPath,
readerContextFactory.getContext());
+ currentLogFilePaths, partitionPath,
readerContextFactory.getContext(), metadataConfig);
Review Comment:
This should use the write config property since it is reading the data files
##########
hudi-spark-datasource/hudi-spark/src/main/scala/org/apache/spark/sql/hudi/command/procedures/PartitionBucketIndexManager.scala:
##########
@@ -195,7 +196,8 @@ class PartitionBucketIndexManager extends BaseProcedure
logInfo("Perform OVERWRITE with dry-run disabled.")
val partitionsToRescale = rescalePartitionsMap.keys
// get all fileSlices need to read
- val allFilesMap = FSUtils.getFilesInPartitions(context, metaClient,
HoodieMetadataConfig.newBuilder.enable(mdtEnable).build,
+ val metadataConfig =
HoodieMetadataConfig.newBuilder.enable(mdtEnable).build
+ val allFilesMap = FSUtils.getFilesInPartitions(context, metaClient,
metadataConfig,
Review Comment:
Redundant change?
##########
hudi-cli/src/main/java/org/apache/hudi/cli/commands/HoodieLogFileCommand.java:
##########
@@ -240,15 +241,17 @@ storage, new StoragePath(logFilePathPattern)).stream()
HoodieFileGroupId fileGroupId = new
HoodieFileGroupId(FSUtils.getRelativePartitionPath(HoodieCLI.getTableMetaClient().getBasePath(),
firstLogFile), FSUtils.getFileIdFromLogPath(firstLogFile));
FileSlice fileSlice = new FileSlice(fileGroupId,
HoodieTimeline.INIT_INSTANT_TS, null, logFilePaths.stream()
.map(l -> new HoodieLogFile(new
StoragePath(l))).collect(Collectors.toList()));
+ TypedProperties fileGroupReaderProperties =
buildFileGroupReaderProperties();
Review Comment:
NIT: We can probably call it defaultFileGroupReaderProperties
##########
hudi-common/src/main/java/org/apache/hudi/metadata/HoodieTableMetadataUtil.java:
##########
@@ -895,7 +895,7 @@ public static <T> HoodieData<HoodieRecord>
convertMetadataToRecordIndexRecords(H
.collect(Collectors.toList());
// Extract revived and deleted keys
Pair<Set<String>, Set<String>> revivedAndDeletedKeys =
getRevivedAndDeletedKeysFromMergedLogs(dataTableMetaClient, instantTime,
allLogFilePaths, finalWriterSchemaOpt,
- currentLogFilePaths, partitionPath,
readerContextFactory.getContext());
+ currentLogFilePaths, partitionPath,
readerContextFactory.getContext(), metadataConfig);
Review Comment:
I think in general we need to verify which config to use in this class. We
are reading data files here from what I checked.
##########
hudi-spark-datasource/hudi-spark/src/test/java/org/apache/hudi/client/functional/TestMetadataUtilRLIandSIRecordGeneration.java:
##########
@@ -728,6 +728,7 @@ Set<String> getRecordKeys(String partition, String
baseInstantTime, String fileI
.withHoodieTableMetaClient(datasetMetaClient)
.withProps(properties)
.withEmitDelete(true)
+ // revisit if this needs to enable log optimized scan
Review Comment:
Is something need to be done here?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]