[
https://issues.apache.org/jira/browse/HUDI-5982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17704925#comment-17704925
]
Wally Tang edited comment on HUDI-5982 at 3/25/23 10:54 AM:
------------------------------------------------------------
My idea is that if the user's primary key data contains ",", we can replace it
with __commas__ _when generating the recordKey. When the user wants to retrieve
the real primary key data from the recordKey, they can replace __commas___
with ",".
was (Author: tangshangwen):
My idea is that if the user's primary key data contains ",", we can replace it
with "_{_}commas__{_}{_}" when generating the recordKey. When the user wants to
retrieve the real primary key data from the recordKey, they can replace
"__{_}{_}commas__{_}" with ",".
> When the user's primary key data contains commas, BucketIdentifier cannot be
> used
> ---------------------------------------------------------------------------------
>
> Key: HUDI-5982
> URL: https://issues.apache.org/jira/browse/HUDI-5982
> Project: Apache Hudi
> Issue Type: Bug
> Components: index
> Affects Versions: 0.12.0
> Reporter: Wally Tang
> Priority: Major
>
> In the scenario of using composite primary keys and bucket index in a Hudi
> table, BucketIdentifier splits the recordKey using commas as a delimiter.
> This can cause exceptions to occur if the user's primary key data contains
> commas.
> {code:java}
> // BucketIdentifier.java
> private static List<String> getHashKeysUsingIndexFields(String recordKey,
> List<String> indexKeyFields) {
> Map<String, String> recordKeyPairs = Arrays.stream(recordKey.split(","))
> .map(p -> p.split(":"))
> .collect(Collectors.toMap(p -> p[0], p -> p[1]));
> return indexKeyFields.stream()
> .map(recordKeyPairs::get).collect(Collectors.toList());
> } {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)