codope opened a new pull request, #11077: URL: https://github.com/apache/hudi/pull/11077
### Change Logs This PR introduces a new class hierarchy for handling merge keys in a more flexible and decoupled manner. It adds the `HoodieMergeKey` interface, along with two implementations: `HoodieSimpleMergeKey` and `HoodieCompositeMergeKey`. This design allows us to extend key-based merge strategies easily. **Motivation** The need for introducing a new merge key handling mechanism was driven by the requirement to support different types of keys (simple and complex) without overloading the existing HoodieKey class, which is central to the write path. By segregating merge key handling into its own hierarchy, we avoid potential conflicts and keep modifications localised, improving the maintainability of the code. **Changes** 1. `HoodieMergeKey`: New API to ensure consistent handling including simple keys and composite keys. It includes methods for retrieving the key and partition path. 2. `HoodieSimpleMergeKey`: Wraps `HoodieKey` and implements the `HoodieMergeKey` interface for simple scenarios where the key is a string. 3. `HoodieCompositeMergeKey`: Implements the `HoodieMergeKey` interface but allows for complex types as keys, enhancing flexibility for scenarios where a simple string key is not sufficient. 4. `HoodieMergeKeyBasedRecordMerger`: A new implementation of `HoodieRecordMerger` based on `HoodieMergeKey`. If the merge keys are of type `HoodieCompositeMergeKey`, then it returns the older and newer records. Otherwise, it calls the merge method from the parent class. 5. `HoodieMergedLogRecordScanner`: Changes to merge based on `HoodieMergeKey`. 6. Unit tests for the new merger. These changes do not affect existing functionalities that do not rely on merge keys. It introduces additional classes that are used explicitly for new functionalities involving various key types in merging operations. This ensures minimal to no risk for existing processes. ### Impact Enhancing the flexibility and robustness of our key-based merge strategies. It helps in keeping our codebase scalable and maintainable, allowing easy extensions and modifications in the future. ### Risk level (write none, low medium or high below) low ### Documentation Update _Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none"._ - _The config description must be updated if new configs are added or the default value of the configs are changed_ - _Any new feature or user-facing change requires updating the Hudi website. Please create a Jira ticket, attach the ticket number here and follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website._ ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org