[
https://issues.apache.org/jira/browse/KYLIN-5640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17746338#comment-17746338
]
ASF subversion and git services commented on KYLIN-5640:
--------------------------------------------------------
Commit 3dc5bfd19c347441efea29b630972f3e950ec20d in kylin's branch
refs/heads/kylin5 from ChenliangLu
[ https://gitbox.apache.org/repos/asf?p=kylin.git;h=3dc5bfd19c ]
KYLIN-5640 Support building dynamic bloom filter that adapts to data
> Support to automatically adjust the Bloom Filter based on data distribution
> ---------------------------------------------------------------------------
>
> Key: KYLIN-5640
> URL: https://issues.apache.org/jira/browse/KYLIN-5640
> Project: Kylin
> Issue Type: Improvement
> Components: Query Engine
> Affects Versions: 5.0-alpha
> Reporter: Zhiting Guo
> Assignee: Zhiting Guo
> Priority: Major
> Fix For: 5.0-beta
>
>
> h3. Why are the changes needed?
> Now the usage of bloom filter is to specify the NDV(number of distinct
> values), and then build BloomFilter. In general scenarios, it is actually not
> sure how much the distinct value is.
> If BloomFilter can be automatically generated according to the data, the file
> size can be reduced and the reading efficiency can also be improved.
> h3. What changes were proposed in this pull request?
> {{DynamicBlockBloomFilter}} contains multiple {{BlockSplitBloomFilter}} as
> candidates and inserts values in the candidates at the same time. Use the
> largest bloom filter as an approximate deduplication counter, and then remove
> incapable bloom filter candidates during data insertion.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)