Vinoth Govindarajan created HUDI-3091:
-----------------------------------------
Summary: Make simple index as the default hoodie.index.type
Key: HUDI-3091
URL: https://issues.apache.org/jira/browse/HUDI-3091
Project: Apache Hudi
Issue Type: New Feature
Components: Index
Reporter: Vinoth Govindarajan
When performing upserts with derived datasets, we often run into an OOM issue
with the bloom filter, hence we changed all the dataset index types to simple
to resolve the issue.
Some of the tables were non-partitioned tables for which bloom index is not the
right choice.
I'm proposing to make a simple index as the default value and on case-by-case
basics, folks can choose the bloom filter for additional performance gains
offered by bloom filters.
I agree that the performance will not be optimal but for regular use cases
simple index would not break and give them sub-optimal read/write performance
but it won't break any ingestion/derived jobs.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)