Chuang Lee created HUDI-4917:
--------------------------------

             Summary: Optimized the way to get HoodieBaseFile of 
loadColumnRangesFromFiles of Bloom Index
                 Key: HUDI-4917
                 URL: https://issues.apache.org/jira/browse/HUDI-4917
             Project: Apache Hudi
          Issue Type: Improvement
          Components: index
            Reporter: Chuang Lee
            Assignee: Chuang Lee
             Fix For: 0.13.0


When using Bloom Index for loadColumnRangesFromFiles in the tagLocation 
process, the existing method is to obtain the hoodieBaseFile by requesting the 
Driver side. When the amount of data is large and the parallelism is high, 
there is a certain network performance bottleneck, resulting in very slow 
tagloacation.
However, hoodieBaseFile can be obtained directly through 
HoodieIndexUtils.getLatestBaseFilesForAllPartitions() in 
loadColumnRangesFromFiles(), so it can effectively improve the performance of 
TagLoaction of Bloom Index.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to