scxwhite opened a new pull request, #7309: URL: https://github.com/apache/hudi/pull/7309
### Change Logs When our input data comes from a complex rdd lineage, hudi writing will lead to repeated calculations. For example, we will de duplicate according to the key of the input data, and we will obtain all partitions to be written to the data in the tag location. So I think we should cache the data to be written for downstream use. ### Impact add new config (hoodie.persist.before.insert) controls whether they should be first persist before insert. ### Risk level (write none, low medium or high below) low ### Documentation Update add new config (hoodie.persist.before.insert) controls whether they should be first persist before insert. ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
