[jira] [Commented] (HUDI-1951) Hash Index for HUDI

ASF GitHub Bot (Jira) Mon, 06 Sep 2021 09:35:20 -0700


    [ 
https://issues.apache.org/jira/browse/HUDI-1951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17410727#comment-17410727
 ]


ASF GitHub Bot commented on HUDI-1951:
--------------------------------------

minihippo commented on a change in pull request #3173:
URL: https://github.com/apache/hudi/pull/3173#discussion_r703004865



##########
File path: 
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/keygen/ComplexAvroKeyGenerator.java
##########
@@ -36,6 +38,9 @@ public ComplexAvroKeyGenerator(TypedProperties props) {
         .split(",")).map(String::trim).filter(s -> 
!s.isEmpty()).collect(Collectors.toList());
     this.partitionPathFields = 
Arrays.stream(props.getString(KeyGeneratorOptions.PARTITIONPATH_FIELD_NAME.key())
         .split(",")).map(String::trim).filter(s -> 
!s.isEmpty()).collect(Collectors.toList());
+    this.indexKeyFields = props.getStringList(

Review comment:
       Bucket key needs to be a subset of the record key,  so that records with 
the same key can go into the same bucket. e.g. bucket key is `colA` and record 
key is `id` + `colA`.
   Generally, the record key is `id` to identified the record, and bucket key 
is more like an attribute to cluster the data. When using bucket index, bucket 
key needs to be added to record key to meet the data distribution and uniquely 
identify a record.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


> Hash Index for HUDI
> -------------------
>
>                 Key: HUDI-1951
>                 URL: https://issues.apache.org/jira/browse/HUDI-1951
>             Project: Apache Hudi
>          Issue Type: New Feature
>            Reporter: XiaoyuGeng
>            Assignee: XiaoyuGeng
>            Priority: Major
>              Labels: pull-request-available
>
> https://cwiki.apache.org/confluence/display/HUDI/RFC+-+29%3A+Hash+Index



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HUDI-1951) Hash Index for HUDI

Reply via email to