[
https://issues.apache.org/jira/browse/HUDI-6734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757270#comment-17757270
]
Akira Ajisaka edited comment on HUDI-6734 at 8/22/23 8:54 AM:
--------------------------------------------------------------
>From my understanding, HUDI-5409 was reverted because it changed the default
>behavior and it became slower in general use cases. However, after the revert,
>we have no workaround for non-metadata tables unless migrating to metadata
>tables. We are seeing the situation that there're several existing Spark jobs
>with Hudi 0.6.0 to access to the same table, and Hudi 0.6.0 doesn't support
>metadata table. In this situation, migrating to metadata table is not feasible
>for the users in the short term.
To solve this issue, I'd like to propose a new parameter to choose whether
avoiding file index or not, and make the default value to false (i.e. default
to use file index). In this way, existing Hudi metadata table user won't hit
any performance issue by default.
was (Author: ajisakaa):
>From my understanding, it's reverted because it changes the behavior by
>default and it become slower in general use cases. However, after the revert,
>we have no workaround for non-metadata tables unless migrating to metadata
>tables. We are seeing the situation that there're several existing Spark jobs
>with Hudi 0.6.0 to access to the same table, and Hudi 0.6.0 doesn't support
>metadata table. In this situation, migrating to metadata table is not feasible
>for the users in the short term.
To solve this issue, I'd like to propose a new parameter to choose whether
avoiding file index or not, and make the default value to false (i.e. default
to use file index). In this way, existing Hudi metadata table user won't hit
any performance issue by default.
> Add back HUDI-5409 in Hudi 0.12.x branch
> ----------------------------------------
>
> Key: HUDI-6734
> URL: https://issues.apache.org/jira/browse/HUDI-6734
> Project: Apache Hudi
> Issue Type: Bug
> Components: index, metadata
> Affects Versions: 0.12.3
> Reporter: Akira Ajisaka
> Priority: Critical
>
> Hudi 0.12.3 is more than 10x slow compared to Hudi 0.12.2 when reading large
> non-metadata partitioned table. The slowness was originally reported by
> [https://github.com/apache/hudi/issues/6940] and fixed by HUDI-5409. However,
> HUDI-5409 is reverted by HUDI-5411 and I'm still seeing the slowness in Hudi
> 0.12.3.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)