zhangyue19921010 opened a new pull request, #7169: URL: https://github.com/apache/hudi/pull/7169
### Change Logs Parallelism does not take effect when `hoodie.combine.before.upsert/insert` false It may cause data skew without shuffle based on our PRD experience. ``` hoodie.upsert.shuffle.parallelism Parallelism to use for upsert operation on the table. Upserts can shuffle data to perform index lookups, file sizing, bin packing records optimallyinto file groups. Default Value: 200 (Optional) Config Param: UPSERT_PARALLELISM_VALUE hoodie.insert.shuffle.parallelism Parallelism for inserting records into the table. Inserts can shuffle data before writing to tune file sizes and optimize the storage layout. Default Value: 200 (Optional) Config Param: INSERT_PARALLELISM_VALUE ``` ### Impact impact upsert with `hoodie.combine.before.upsert` false (default true) impact insert with `hoodie.combine.before.insert` false (default false) ### Risk level (write none, low medium or high below) low or medium ? ### Documentation Update no ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
