Jian Feng created HUDI-4210:
-------------------------------
Summary: Create custom hbase index to solve data skew issue on
hbase regions
Key: HUDI-4210
URL: https://issues.apache.org/jira/browse/HUDI-4210
Project: Apache Hudi
Issue Type: Improvement
Components: index
Reporter: Jian Feng
Assignee: Jian Feng
In our production environment, since many table's id is auto-increment, if
using Hbase index, will cause a data skew issue in HBase regions. it is better
to find a way to add random prefixes and also keep ordering in hudi itself.
we may have a small modification to the HBase index. add the prefix on the
aspect of query and update HBase. In
this way, the pk in HBase will be different from the one in hudi but such
logic will be transparent to business logic. I have adopted this method in
prod environment. Using withIndexClass config in IndexConfig could specify
the custom index
Another work, driven by uber engineers
[https://github.com/apache/hudi/pull/3508] could
technically solve the issue by directly reading HFiles, but still in progress,
this approach should resolve this issue immediately
--
This message was sent by Atlassian Jira
(v8.20.7#820007)