bhat-vinay opened a new pull request, #10617: URL: https://github.com/apache/hudi/pull/10617
This PR is borrowed from PR 10211 (https://github.com/apache/hudi/pull/10211). This lays the groundwork for supporting duplicate keys (non-unique keys) in the HFile reader and writer (for metadata tables). The writer part is fully supported, but the reader side requires some additional support to merge base file and log records when there are duplicate keys (which will come in a subsequent PR). This PR also fixes some minor issues with the reader which was failing a RecodIndex unit tests. ### Change Logs A major requirement for supporting secondary (non-unique) index is the ability in the metadata table to have duplicate keys. This in turn means that the HFile (and delta log files) for the metadata table should support returning list of records for a given key. This PR inmplements the same and it is borrowed from https://github.com/apache/hudi/pull/10211 (with some fixes in the read path) This lays the groundwork for supporting duplicate keys (non-unique keys) in the HFile reader and writer (for metadata tables). The writer part is fully supported, but the reader side requires some additional support to merge base file and log records when there are duplicate keys. The reader support will come in a subsequent PR which will add some additional entries (for secondary keys) in the metadata schema. This PR also also fixes some minor issues with the reader which was failing a RecodIndex unit tests. ### Impact The metadata table writers can write duplicate keys ### Risk level (write none, low medium or high below) Medium. All tests are made green ### Documentation Update None ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Change Logs and Impact were stated clearly - [ ] Adequate tests were added if applicable - [ ] CI passed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
