yihua opened a new pull request, #7396:
URL: https://github.com/apache/hudi/pull/7396

   ### Change Logs
   
   Before this change, the NONE sort mode for bulk insert does coalesce for the 
input records or rows based on the shuffle parallelism of bulk insert 
(`hoodie.bulkinsert.shuffle.parallelism`) to reduce the parallelism.  This 
could affect write latency if the cluster workers are not fully utilized due to 
reduced parallelism.
   
   This PR removes the coalesce within NONE sort mode for bulk insert to match 
the default parquet write behavior.
   
   ### Impact
   
   The removal of coalesce within NONE sort mode for bulk insert will reduce 
the write latency if the input parallelism is higher and the cluster workers 
are not fully utilized due to the lower shuffle parallelism of bulk insert.
   
   ### Risk level
   
   low
   
   ### Documentation Update
   
   [HUDI-5339](https://issues.apache.org/jira/browse/HUDI-5339) for updating 
docs regarding the behavior change in NONE sort mode for bulk insert.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to