[jira] [Commented] (HUDI-5982) When the user's primary key data contains commas, BucketIdentifier cannot be used

Sagar Sumit (Jira) Sat, 25 Mar 2023 07:39:05 -0700


    [ 
https://issues.apache.org/jira/browse/HUDI-5982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17704942#comment-17704942
 ]


Sagar Sumit commented on HUDI-5982:
-----------------------------------

Is it common to have commas in a primary key field name? In my opinion, it 
should be fixed upstream.

> When the user's primary key data contains commas, BucketIdentifier cannot be 
> used
> ---------------------------------------------------------------------------------
>
>                 Key: HUDI-5982
>                 URL: https://issues.apache.org/jira/browse/HUDI-5982
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: index
>    Affects Versions: 0.12.0
>            Reporter: Wally Tang
>            Priority: Major
>
> In the scenario of using composite primary keys and bucket index in a Hudi 
> table, BucketIdentifier splits the recordKey using commas as a delimiter. 
> This can cause exceptions to occur if the user's primary key data contains 
> commas.
> {code:java}
> // BucketIdentifier.java
> private static List<String> getHashKeysUsingIndexFields(String recordKey, 
> List<String> indexKeyFields) {
>   Map<String, String> recordKeyPairs = Arrays.stream(recordKey.split(","))
>       .map(p -> p.split(":"))
>       .collect(Collectors.toMap(p -> p[0], p -> p[1]));
>   return indexKeyFields.stream()
>       .map(recordKeyPairs::get).collect(Collectors.toList());
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HUDI-5982) When the user's primary key data contains commas, BucketIdentifier cannot be used

Reply via email to