Sagar Sumit created HUDI-8489:
---------------------------------
Summary: Fix encoding of secondary index key
Key: HUDI-8489
URL: https://issues.apache.org/jira/browse/HUDI-8489
Project: Apache Hudi
Issue Type: Task
Reporter: Sagar Sumit
Fix For: 1.0.0
Secondary index key is a combination of secondaryKey and recordKey. There are
two ways to encode with a delimiter ($):
# Run base64 encoding: `Base64.encode(secondaryKey) + DELIMITER +
Base64.encode(recordKey)`. Base64 does not map to $. So, this gives us a neat
and standard way to encode. Might not be very efficient for long strings? But,
base64 is a standard scheme.
# Escape special characters: `escapeSpecialChars(secondaryKey) + DELIMITER +
escapeSpecialChars(recordKey)`. The keys are readable and preserves the order.
This is a custom scheme not used in other systems.
Ran a benchmark to compare encoding/decoding time and did not find much
difference - https://gist.github.com/codope/b1c73abed748d77c0b4db974d835f9da
--
This message was sent by Atlassian Jira
(v8.20.10#820010)