[
https://issues.apache.org/jira/browse/HUDI-9164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17935293#comment-17935293
]
Y Ethan Guo commented on HUDI-9164:
-----------------------------------
Improved the description for clarify.
> Improve file group sharing strategy of secondary index in MDT
> -------------------------------------------------------------
>
> Key: HUDI-9164
> URL: https://issues.apache.org/jira/browse/HUDI-9164
> Project: Apache Hudi
> Issue Type: Improvement
> Reporter: Davis Zhang
> Priority: Blocker
> Fix For: 1.1.0
>
>
> h3. MDT sec idx file layout does not favor lookup/join efficiently
> Regarding the MDT join with an incoming pruning set RDD[Internal Row], the
> existing secondary index data layout does not favor batch prefix look up.
>
> h4. MDT index layout
> MDT secondary index are using key value pair, where the key uses scheme
> <data column value><separator><record key value>
> and value is the file group id.
>
> So you can see all records comes with the prefix of the column value.
>
> It adopts {*}hash based partitioning{*}, which means it takes Full key <data
> col value><record key value>, hash it and decide which file group the
> partition belongs to.
>
> h3. The query pattern it serves
>
> In a nutshell, the data layout is hash partitioning while the query pattern
> is prefix lookup, this 2 does not match at all.
>
> 2 types of query pattern against the index: * point look up given only a
> secondary index column value, meaning only {{<data column value>}} is given
> and we need to look up all file group ids associated
> * Join with a large amount of column value: this is how secondary index join
> would work. When joining tableGeneratingPruningSet and
> tableWithIdxTobePruned, the tableGeneratingPruningSet generates a RDD of
> values for data column C1, we use this RDD joining with MDT C1 secondary
> index to figure out file group ids of interest. Here we are looking at join
> between this RDD and MDT at a large scale.
>
> Because we only knew the {{<data column value>}} from the input, which is
> only the prefix of the secondary index key, so we don't know which bucket the
> potential MDT records belongs to. As a result, even for point look up we need
> to load the full MDT and the complexity is O(n).
>
> This is not scalable .
>
> Needs a improvements on the partition scheme to handle prefix based search at
> a large scale.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)