Chuang Lee created HUDI-4917:
--------------------------------
Summary: Optimized the way to get HoodieBaseFile of
loadColumnRangesFromFiles of Bloom Index
Key: HUDI-4917
URL: https://issues.apache.org/jira/browse/HUDI-4917
Project: Apache Hudi
Issue Type: Improvement
Components: index
Reporter: Chuang Lee
Assignee: Chuang Lee
Fix For: 0.13.0
When using Bloom Index for loadColumnRangesFromFiles in the tagLocation
process, the existing method is to obtain the hoodieBaseFile by requesting the
Driver side. When the amount of data is large and the parallelism is high,
there is a certain network performance bottleneck, resulting in very slow
tagloacation.
However, hoodieBaseFile can be obtained directly through
HoodieIndexUtils.getLatestBaseFilesForAllPartitions() in
loadColumnRangesFromFiles(), so it can effectively improve the performance of
TagLoaction of Bloom Index.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)