Jianhui Dong created HUDI-6120:
----------------------------------

             Summary: Streaming read will read basefile even if skipBaseFiles 
is set to true
                 Key: HUDI-6120
                 URL: https://issues.apache.org/jira/browse/HUDI-6120
             Project: Apache Hudi
          Issue Type: Improvement
            Reporter: Jianhui Dong


Check the code snippet of 
org.apache.hudi.common.table.view.AbstractTableFileSystemView#fetchAllLogsMergedFileSlice:
{code:java}
private Option<FileSlice> fetchAllLogsMergedFileSlice(HoodieFileGroup 
fileGroup, String maxInstantTime) {
  List<FileSlice> fileSlices = 
fileGroup.getAllFileSlicesBeforeOn(maxInstantTime).collect(Collectors.toList());
  if (fileSlices.size() == 0) {
    return Option.empty();
  }
  if (fileSlices.size() == 1) {
    return Option.of(fileSlices.get(0));
  }
  final FileSlice latestSlice = fileSlices.get(0);
  FileSlice merged = new FileSlice(latestSlice.getPartitionPath(), 
latestSlice.getBaseInstantTime(),
      latestSlice.getFileId());

  // add log files from the latest slice to the earliest
  fileSlices.forEach(slice -> slice.getLogFiles().forEach(merged::addLogFile));
  return Option.of(merged);
}{code}
if we only fetch one file slice, we will return the file slice with basefile, 
and then hudi-flink will create a SkipMergeIterator/MergeIterator which both 
reads basefile and logfiles for the split.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to