Wally Tang created HUDI-5982:
--------------------------------

             Summary: When the user's primary key data contains commas, 
BucketIdentifier cannot be used
                 Key: HUDI-5982
                 URL: https://issues.apache.org/jira/browse/HUDI-5982
             Project: Apache Hudi
          Issue Type: Bug
          Components: index
    Affects Versions: 0.12.0
            Reporter: Wally Tang


In the scenario of using composite primary keys and bucket index in a Hudi 
table, BucketIdentifier splits the recordKey using commas as a delimiter. This 
can cause exceptions to occur if the user's primary key data contains commas.
{code:java}
// BucketIdentifier.java
private static List<String> getHashKeysUsingIndexFields(String recordKey, 
List<String> indexKeyFields) {
  Map<String, String> recordKeyPairs = Arrays.stream(recordKey.split(","))
      .map(p -> p.split(":"))
      .collect(Collectors.toMap(p -> p[0], p -> p[1]));
  return indexKeyFields.stream()
      .map(recordKeyPairs::get).collect(Collectors.toList());
} {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to