vinishjail97 opened a new pull request, #11758:
URL: https://github.com/apache/hudi/pull/11758

   ### Change Logs
   
   If there is a skew in user defined columns for sortKey, spark sort reduces 
the number of tasks and this leads to an increase in contention when writing 
parquet files.  
   
   ### Impact
   
   None, this handles skew for partitioners honouring user defined sort 
columns. 
   
   ### Risk level (write none, low medium or high below)
   
   Low. The behaviour is behind this config and by default it's false.  
   
   ### Documentation Update
   
   ```
     public static final ConfigProperty<Boolean> 
BULKINSERT_SUFFIX_RECORD_KEY_FOR_USER_DEFINED_SORT_COLUMNS = ConfigProperty
         .key("hoodie.bulkinsert.suffix.record_key.user.defined.sort.columns")
         .defaultValue(false)
         .markAdvanced()
         .withDocumentation(
             "When using user defined sort columns there can be possibility of 
skew and can cause increase in commit durations, "
                 + "enabling this config suffixes the record key at the end to 
avoid skew");
   
   ```
   
   ### Contributor's checklist
   
   - [x] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [x] Change Logs and Impact were stated clearly
   - [x] Adequate tests were added if applicable
   - [x] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to