[
https://issues.apache.org/jira/browse/HUDI-8489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kate Huber updated HUDI-8489:
-----------------------------
Sprint: Hudi 1.0 Blockers+Bugs Sprint
> Fix encoding of secondary index key
> -----------------------------------
>
> Key: HUDI-8489
> URL: https://issues.apache.org/jira/browse/HUDI-8489
> Project: Apache Hudi
> Issue Type: Task
> Reporter: Sagar Sumit
> Priority: Blocker
> Fix For: 1.0.0
>
>
> Secondary index key is a combination of secondaryKey and recordKey. There are
> two ways to encode with a delimiter ($):
> # Run base64 encoding: `Base64.encode(secondaryKey) + DELIMITER +
> Base64.encode(recordKey)`. Base64 does not map to $. So, this gives us a
> neat and standard way to encode. Might not be very efficient for long
> strings? But, base64 is a standard scheme.
> # Escape special characters: `escapeSpecialChars(secondaryKey) + DELIMITER
> + escapeSpecialChars(recordKey)`. The keys are readable and preserves the
> order. This is a custom scheme not used in other systems.
> Ran a benchmark to compare encoding/decoding time and did not find much
> difference - https://gist.github.com/codope/b1c73abed748d77c0b4db974d835f9da
--
This message was sent by Atlassian Jira
(v8.20.10#820010)