zhangyue19921010 opened a new pull request, #7169:
URL: https://github.com/apache/hudi/pull/7169

   ### Change Logs
   
   Parallelism does not take effect when `hoodie.combine.before.upsert/insert` 
false
   It may cause data skew without shuffle based on our PRD experience.
   
   ```
   hoodie.upsert.shuffle.parallelism
   
   Parallelism to use for upsert operation on the table. Upserts can shuffle 
data to perform index lookups, file sizing, bin packing records optimallyinto 
file groups.
   Default Value: 200 (Optional)
   Config Param: UPSERT_PARALLELISM_VALUE
   
   hoodie.insert.shuffle.parallelism
   
   Parallelism for inserting records into the table. Inserts can shuffle data 
before writing to tune file sizes and optimize the storage layout.
   Default Value: 200 (Optional)
   Config Param: INSERT_PARALLELISM_VALUE
   ```
   
   ### Impact
   impact upsert with `hoodie.combine.before.upsert` false (default true)
   impact insert with `hoodie.combine.before.insert` false (default false)
   
   ### Risk level (write none, low medium or high below)
   
   low or medium ?
   
   ### Documentation Update
   
   no
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to