[PR] [HUDI-9535] Prevent Large Broadcast in Spark Partitioner [hudi]

via GitHub Wed, 18 Jun 2025 10:03:16 -0700


jonvex opened a new pull request, #13459:
URL: https://github.com/apache/hudi/pull/13459


   ### Change Logs
   
   For large tables, the broadcast of some of the partitioners can be a 
performance issue. Most of the data is not used, so SparkBucketInfoGetter 
abstraction is added to be a lightweight holder of only the data needed for 
getBucketInfo().
   
   ### Impact
   
   Performance improvement
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   N/A
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [HUDI-9535] Prevent Large Broadcast in Spark Partitioner [hudi]

Reply via email to