yihua opened a new pull request, #7396: URL: https://github.com/apache/hudi/pull/7396
### Change Logs Before this change, the NONE sort mode for bulk insert does coalesce for the input records or rows based on the shuffle parallelism of bulk insert (`hoodie.bulkinsert.shuffle.parallelism`) to reduce the parallelism. This could affect write latency if the cluster workers are not fully utilized due to reduced parallelism. This PR removes the coalesce within NONE sort mode for bulk insert to match the default parquet write behavior. ### Impact The removal of coalesce within NONE sort mode for bulk insert will reduce the write latency if the input parallelism is higher and the cluster workers are not fully utilized due to the lower shuffle parallelism of bulk insert. ### Risk level low ### Documentation Update [HUDI-5339](https://issues.apache.org/jira/browse/HUDI-5339) for updating docs regarding the behavior change in NONE sort mode for bulk insert. ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
