lokeshj1703 opened a new pull request, #7668:
URL: https://github.com/apache/hudi/pull/7668

   ### Change Logs
   
   At present, Hudi needs an record key and preCombine key to create an Hudi 
datasets, which puts an restriction on the kinds of datasets we can create 
using Hudi.
   
   In order to increase the adoption of Hudi file format across all kinds of 
derived datasets, similar to Parquet/ORC, we need to offer flexibility to 
users. I understand that record key is used for upsert primitive and we need 
preCombine key to break the tie and deduplicate, but there are event data and 
other datasets without any primary key (append only datasets), which can 
benefit from Hudi since Hudi ecosystem offers other features such as snapshot 
isolation, indexes, clustering, delta streamer etc., which could be applied to 
any datasets without record key.
   
   The idea of this proposal is to make both the record key and preCombine key 
optional to allow variety of new use cases on top of Hudi.
   
   ### Impact
   
   None
   
   ### Risk level (write none, low medium or high below)
   
   Medium (Added tests)
   
   ### Documentation Update
   
   Adds support for KeylessGenerator for insert operations. This ensures user 
doesn't need to configure record key for inserts in an immutable dataset.
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to