[
https://issues.apache.org/jira/browse/HUDI-5982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17704942#comment-17704942
]
Sagar Sumit commented on HUDI-5982:
-----------------------------------
Is it common to have commas in a primary key field name? In my opinion, it
should be fixed upstream.
> When the user's primary key data contains commas, BucketIdentifier cannot be
> used
> ---------------------------------------------------------------------------------
>
> Key: HUDI-5982
> URL: https://issues.apache.org/jira/browse/HUDI-5982
> Project: Apache Hudi
> Issue Type: Bug
> Components: index
> Affects Versions: 0.12.0
> Reporter: Wally Tang
> Priority: Major
>
> In the scenario of using composite primary keys and bucket index in a Hudi
> table, BucketIdentifier splits the recordKey using commas as a delimiter.
> This can cause exceptions to occur if the user's primary key data contains
> commas.
> {code:java}
> // BucketIdentifier.java
> private static List<String> getHashKeysUsingIndexFields(String recordKey,
> List<String> indexKeyFields) {
> Map<String, String> recordKeyPairs = Arrays.stream(recordKey.split(","))
> .map(p -> p.split(":"))
> .collect(Collectors.toMap(p -> p[0], p -> p[1]));
> return indexKeyFields.stream()
> .map(recordKeyPairs::get).collect(Collectors.toList());
> } {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)