[
https://issues.apache.org/jira/browse/HUDI-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jian Feng resolved HUDI-4210.
-----------------------------
> Create custom hbase index to solve data skew issue on hbase regions
> -------------------------------------------------------------------
>
> Key: HUDI-4210
> URL: https://issues.apache.org/jira/browse/HUDI-4210
> Project: Apache Hudi
> Issue Type: Improvement
> Components: index
> Reporter: Jian Feng
> Assignee: Jian Feng
> Priority: Major
> Labels: pull-request-available
>
> In our production environment, since many table's id is auto-increment, if
> using Hbase index, will cause a data skew issue in HBase regions. it is
> better to find a way to add random prefixes and also keep ordering in hudi
> itself.
> we may have a small modification to the HBase index. add the prefix on the
> aspect of query and update HBase. In
> this way, the pk in HBase will be different from the one in hudi but such
> logic will be transparent to business logic. I have adopted this method in
> prod environment. Using withIndexClass config in IndexConfig could specify
> the custom index
>
> Another work, driven by uber engineers
> [https://github.com/apache/hudi/pull/3508] could
> technically solve the issue by directly reading HFiles, but still in
> progress, this approach should resolve this issue immediately
--
This message was sent by Atlassian Jira
(v8.20.10#820010)