[ 
https://issues.apache.org/jira/browse/KYLIN-5640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoxiang Yu resolved KYLIN-5640.
---------------------------------
    Resolution: Fixed

> Support to automatically adjust the Bloom Filter based on data distribution
> ---------------------------------------------------------------------------
>
>                 Key: KYLIN-5640
>                 URL: https://issues.apache.org/jira/browse/KYLIN-5640
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Query Engine
>    Affects Versions: 5.0-alpha
>            Reporter: Zhiting Guo
>            Assignee: Zhiting Guo
>            Priority: Major
>             Fix For: 5.0-beta
>
>
> h3. Why are the changes needed?
> Now the usage of bloom filter is to specify the NDV(number of distinct 
> values), and then build BloomFilter. In general scenarios, it is actually not 
> sure how much the distinct value is.
> If BloomFilter can be automatically generated according to the data, the file 
> size can be reduced and the reading efficiency can also be improved.
> h3. What changes were proposed in this pull request?
> {{DynamicBlockBloomFilter}} contains multiple {{BlockSplitBloomFilter}} as 
> candidates and inserts values in the candidates at the same time. Use the 
> largest bloom filter as an approximate deduplication counter, and then remove 
> incapable bloom filter candidates during data insertion.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to