Jianhui Dong created HUDI-6120:
----------------------------------
Summary: Streaming read will read basefile even if skipBaseFiles
is set to true
Key: HUDI-6120
URL: https://issues.apache.org/jira/browse/HUDI-6120
Project: Apache Hudi
Issue Type: Improvement
Reporter: Jianhui Dong
Check the code snippet of
org.apache.hudi.common.table.view.AbstractTableFileSystemView#fetchAllLogsMergedFileSlice:
{code:java}
private Option<FileSlice> fetchAllLogsMergedFileSlice(HoodieFileGroup
fileGroup, String maxInstantTime) {
List<FileSlice> fileSlices =
fileGroup.getAllFileSlicesBeforeOn(maxInstantTime).collect(Collectors.toList());
if (fileSlices.size() == 0) {
return Option.empty();
}
if (fileSlices.size() == 1) {
return Option.of(fileSlices.get(0));
}
final FileSlice latestSlice = fileSlices.get(0);
FileSlice merged = new FileSlice(latestSlice.getPartitionPath(),
latestSlice.getBaseInstantTime(),
latestSlice.getFileId());
// add log files from the latest slice to the earliest
fileSlices.forEach(slice -> slice.getLogFiles().forEach(merged::addLogFile));
return Option.of(merged);
}{code}
if we only fetch one file slice, we will return the file slice with basefile,
and then hudi-flink will create a SkipMergeIterator/MergeIterator which both
reads basefile and logfiles for the split.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)