[
https://issues.apache.org/jira/browse/HUDI-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17410727#comment-17410727
]
ASF GitHub Bot commented on HUDI-1951:
--------------------------------------
minihippo commented on a change in pull request #3173:
URL: https://github.com/apache/hudi/pull/3173#discussion_r703004865
##########
File path:
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/ComplexAvroKeyGenerator.java
##########
@@ -36,6 +38,9 @@ public ComplexAvroKeyGenerator(TypedProperties props) {
.split(",")).map(String::trim).filter(s ->
!s.isEmpty()).collect(Collectors.toList());
this.partitionPathFields =
Arrays.stream(props.getString(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME.key())
.split(",")).map(String::trim).filter(s ->
!s.isEmpty()).collect(Collectors.toList());
+ this.indexKeyFields = props.getStringList(
Review comment:
Bucket key needs to be a subset of the record key, so that records with
the same key can go into the same bucket. e.g. bucket key is `colA` and record
key is `id` + `colA`.
Generally, the record key is `id` to identified the record, and bucket key
is more like an attribute to cluster the data. When using bucket index, bucket
key needs to be added to record key to meet the data distribution and uniquely
identify a record.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
> Hash Index for HUDI
> -------------------
>
> Key: HUDI-1951
> URL: https://issues.apache.org/jira/browse/HUDI-1951
> Project: Apache Hudi
> Issue Type: New Feature
> Reporter: XiaoyuGeng
> Assignee: XiaoyuGeng
> Priority: Major
> Labels: pull-request-available
>
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+29%3A+Hash+Index
--
This message was sent by Atlassian Jira
(v8.3.4#803005)