bhat-vinay opened a new pull request, #10617:
URL: https://github.com/apache/hudi/pull/10617

   This PR is borrowed from PR 10211 
(https://github.com/apache/hudi/pull/10211). This lays the groundwork for 
supporting duplicate keys (non-unique keys) in the HFile reader and writer (for 
metadata tables). The writer part is fully supported, but the reader side 
requires some additional support to merge base file and log records when there 
are duplicate keys (which will come in a subsequent PR). This PR also fixes 
some minor issues with the reader which was failing a RecodIndex unit tests.
   
   ### Change Logs
   
   A major requirement for supporting secondary (non-unique) index is the 
ability in the metadata table to have duplicate keys. This in turn means that 
the HFile (and delta log files) for the metadata table should support returning 
list of records for a given key. This PR inmplements the same and it is 
borrowed from https://github.com/apache/hudi/pull/10211 (with some fixes in the 
read path)
   
   This lays the groundwork for supporting duplicate keys (non-unique keys) in 
the HFile reader and writer (for metadata tables). The writer part is fully 
supported, but the reader side requires some additional support to merge base 
file and log records when there are duplicate keys. The reader support will 
come in a subsequent PR which will add some additional entries (for secondary 
keys) in the metadata schema. This PR also also fixes some minor issues with 
the reader which was failing a RecodIndex unit tests.
   
   ### Impact
   
   The metadata table writers can write duplicate keys
   
   ### Risk level (write none, low medium or high below)
   
   Medium. All tests are made green 
   
   ### Documentation Update
   
   None
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's 
guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to