zhangyue19921010 opened a new pull request, #12884:
URL: https://github.com/apache/hudi/pull/12884

   ### Change Logs
   
   As we know, Hudi proposed and introduced Bucket Index in RFC-29. Bucket 
Index can well unify the indexes of Flink and
   Spark, that is, Spark and Flink could upsert the same Hudi table using 
bucket index.
   
   However, Bucket Index has a limit of fixed number of buckets. In order to 
solve this problem, RFC-42 proposed the ability
   of consistent hashing achieving bucket resizing by splitting or merging 
several local buckets dynamically.
   
   But from PRD experience, sometimes a Partition-Level Bucket Index and a 
offline way to do bucket rescale is good enough
   without introducing additional efforts (multiple writes, clustering, 
automatic resizing,etc.). Because the more complex
   the Architecture, the more error-prone it is and the greater operation and 
maintenance pressure.
   
   In this regard, we could upgrade the traditional Bucket Index to implement a 
Partition-Level Bucket Index, so that users
   can set a specific number of buckets for different partitions through a rule 
engine (such as regular expression matching).
   On the other hand, for a certain existing partitions, an off-line command is 
provided to reorganized the data using insert
   overwrite(need to stop the data writing of the current partition).
   
   More importantly, the existing Bucket Index table can be upgraded to 
Partition-Level Bucket Index smoothly and seamlessly.
   
   ### Impact
   
   no
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
     ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
     changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to