nsivabalan commented on code in PR #13383:
URL: https://github.com/apache/hudi/pull/13383#discussion_r2129012892
##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/compact/plan/generators/HoodieLogCompactionPlanGenerator.java:
##########
@@ -88,33 +87,28 @@ protected boolean filterLogCompactionOperations() {
}
/**
- * Can schedule logcompaction if log files count is greater than 4 or total
log blocks is greater than 4.
+ * Can schedule logcompaction if log files count or total log blocks is
greater than the configured threshold.
* @param fileSlice File Slice under consideration.
+ * @param instantRange Range of valid instants.
* @return Boolean value that determines whether log compaction will be
scheduled or not.
*/
- private boolean isFileSliceEligibleForLogCompaction(FileSlice fileSlice,
String maxInstantTime,
+ private boolean isFileSliceEligibleForLogCompaction(FileSlice fileSlice,
Option<InstantRange>
instantRange) {
- LOG.info("Checking if fileId " + fileSlice.getFileId() + " and partition "
- + fileSlice.getPartitionPath() + " eligible for log compaction.");
+ LOG.info("Checking if fileId {} and partition {} eligible for log
compaction.", fileSlice.getFileId(), fileSlice.getPartitionPath());
HoodieTableMetaClient metaClient = hoodieTable.getMetaClient();
- HoodieUnMergedLogRecordScanner scanner =
HoodieUnMergedLogRecordScanner.newBuilder()
- .withStorage(metaClient.getStorage())
- .withBasePath(hoodieTable.getMetaClient().getBasePath())
- .withLogFilePaths(fileSlice.getLogFiles()
- .sorted(HoodieLogFile.getLogFileComparator())
- .map(file -> file.getPath().toString())
- .collect(Collectors.toList()))
- .withLatestInstantTime(maxInstantTime)
- .withInstantRange(instantRange)
- .withBufferSize(writeConfig.getMaxDFSStreamBufferSize())
- .withOptimizedLogBlocksScan(true)
- .withRecordMerger(writeConfig.getRecordMerger())
- .withTableMetaClient(metaClient)
- .build();
- scanner.scan(true);
+ long numLogFiles = fileSlice.getLogFiles().count();
+ if (numLogFiles >= writeConfig.getLogCompactionBlocksThreshold()) {
Review Comment:
I was wrong. we could have 1 data block and a delete block in the same log
file.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]