[ 
https://issues.apache.org/jira/browse/HUDI-8489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kate Huber updated HUDI-8489:
-----------------------------
    Sprint: Hudi 1.0 Blockers+Bugs Sprint

> Fix encoding of secondary index key
> -----------------------------------
>
>                 Key: HUDI-8489
>                 URL: https://issues.apache.org/jira/browse/HUDI-8489
>             Project: Apache Hudi
>          Issue Type: Task
>            Reporter: Sagar Sumit
>            Priority: Blocker
>             Fix For: 1.0.0
>
>
> Secondary index key is a combination of secondaryKey and recordKey. There are 
> two ways to encode with a delimiter ($):
>  # Run base64 encoding: `Base64.encode(secondaryKey) + DELIMITER + 
> Base64.encode(recordKey)`.  Base64 does not map to $. So, this gives us a 
> neat and standard way to encode. Might not be very efficient for long 
> strings? But, base64 is a standard scheme.
>  # Escape special characters:  `escapeSpecialChars(secondaryKey) + DELIMITER 
> + escapeSpecialChars(recordKey)`. The keys are readable and preserves the 
> order. This is a custom scheme not used in other systems.
> Ran a benchmark to compare encoding/decoding time and did not find much 
> difference - https://gist.github.com/codope/b1c73abed748d77c0b4db974d835f9da



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to