nsivabalan opened a new pull request, #7146:
URL: https://github.com/apache/hudi/pull/7146

   ### Change Logs
   
   Sometimes users prefer to sort the incoming records based on some columns 
with insert/upsert. As of now, sorting is supported only w/ bulk_insert. This 
patch adds the support with insert and upsert operation as well. 
   
   Typical use-case:
   Classic problem of event time vs query predicates. in case of uber's trip 
data, dataset will be partitioned on datestr, but most of the queries might be 
based on city_id. So, instead of relying on clustering to sort after the fact, 
this patch adds support to sort before ingesting only. 
   
   ### Impact
   
   Users will now be able to optionally sort records based on columns of their 
choice while ingesting records with insert or upsert. 
   Configs of interest:
   hoodie.write.sort.mode: possible values NONE, GLOBAL_SORT and 
PARTITIONER_SORT
   hoodie.write.sort.cols: comma separated list of columns to sort. 
   
   ### Risk level (write none, low medium or high below)
   
   Medium
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change_
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
     ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
     changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to