[
https://issues.apache.org/jira/browse/HUDI-7111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Danny Chen closed HUDI-7111.
----------------------------
Resolution: Fixed
Fixed via master branch: d24220a4804ee6e04346a03a4ddbf2d2711ae301
> Performance regression of spark job which written into simple bucket index
> table
> --------------------------------------------------------------------------------
>
> Key: HUDI-7111
> URL: https://issues.apache.org/jira/browse/HUDI-7111
> Project: Apache Hudi
> Issue Type: Improvement
> Components: spark
> Reporter: Jing Zhang
> Priority: Major
> Labels: pull-request-available
> Fix For: 0.14.1
>
> Attachments: image-2023-11-16-23-41-32-729.png
>
>
> After upgrade the version to 0.14.0, the performance of the Spark job, which
> is written into a simple bucket index table, is regressing.
> !image-2023-11-16-23-41-32-729.png!
> The reason is in the [PR#4480|https://github.com/apache/hudi/pull/4480], the
> refactor of bucket index introduce two unnecessary stages in tag for simple
> bucket index.
> {code:java}
> List<String> partitions =
> records.map(HoodieRecord::getPartitionPath).distinct().collectAsList();
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)