codope opened a new pull request, #11077:
URL: https://github.com/apache/hudi/pull/11077

   ### Change Logs
   
   This PR introduces a new class hierarchy for handling merge keys in a more 
flexible and decoupled manner. It adds the `HoodieMergeKey` interface, along 
with two implementations: `HoodieSimpleMergeKey` and `HoodieCompositeMergeKey`. 
This design allows us to extend key-based merge strategies easily.
   
   **Motivation**
   
   The need for introducing a new merge key handling mechanism was driven by 
the requirement to support different types of keys (simple and complex) without 
overloading the existing HoodieKey class, which is central to the write path. 
By segregating merge key handling into its own hierarchy, we avoid potential 
conflicts and keep modifications localised, improving the maintainability of 
the code.
   
   **Changes**
   
   1. `HoodieMergeKey`: New API to ensure consistent handling including simple 
keys and composite keys. It includes methods for retrieving the key and 
partition path.
   2. `HoodieSimpleMergeKey`: Wraps `HoodieKey` and implements the 
`HoodieMergeKey` interface for simple scenarios where the key is a string.
   3. `HoodieCompositeMergeKey`: Implements the  `HoodieMergeKey` interface but 
allows for complex types as keys, enhancing flexibility for scenarios where a 
simple string key is not sufficient.
   4. `HoodieMergeKeyBasedRecordMerger`: A new implementation of 
`HoodieRecordMerger` based on `HoodieMergeKey`. If the merge keys are of type 
`HoodieCompositeMergeKey`, then it returns the older and newer records. 
Otherwise, it calls the merge method from the parent class.
   5. `HoodieMergedLogRecordScanner`: Changes to merge based on 
`HoodieMergeKey`.
   6. Unit tests for the new merger.
   
   These changes do not affect existing functionalities that do not rely on 
merge keys. It introduces additional classes that are used explicitly for new 
functionalities involving various key types in merging operations. This ensures 
minimal to no risk for existing processes.
   
   ### Impact
   
   Enhancing the flexibility and robustness of our key-based merge strategies. 
It helps in keeping our codebase scalable and maintainable, allowing easy 
extensions and modifications in the future.
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   _Describe any necessary documentation update if there is any new feature, 
config, or user-facing change. If not, put "none"._
   
   - _The config description must be updated if new configs are added or the 
default value of the configs are changed_
   - _Any new feature or user-facing change requires updating the Hudi website. 
Please create a Jira ticket, attach the
     ticket number here and follow the 
[instruction](https://hudi.apache.org/contribute/developer-setup#website) to 
make
     changes to the website._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to