[GitHub] [hudi] scxwhite opened a new pull request, #7309: [HUDI-5284] Add new config controls whether input rdds should be first persist before insert.

GitBox Sun, 27 Nov 2022 20:55:24 -0800


scxwhite opened a new pull request, #7309:
URL: https://github.com/apache/hudi/pull/7309


   ### Change Logs
   
   When our input data comes from a complex rdd lineage, hudi writing will lead 
to repeated calculations.
   For example, we will de duplicate according to the key of the input data, 
and we will obtain all partitions to be written to the data in the tag 
location. So I think we should cache the data to be written for downstream use.
   
   ### Impact
   
   add new config (hoodie.persist.before.insert) controls whether they should 
be first persist before insert.
   ### Risk level (write none, low medium or high below)
   
   low
   ### Documentation Update
   
   add new config (hoodie.persist.before.insert) controls whether they should 
be first persist before insert.
   
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] scxwhite opened a new pull request, #7309: [HUDI-5284] Add new config controls whether input rdds should be first persist before insert.

Reply via email to