[ https://issues.apache.org/jira/browse/KYLIN-5640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiaoxiang Yu resolved KYLIN-5640. --------------------------------- Resolution: Fixed > Support to automatically adjust the Bloom Filter based on data distribution > --------------------------------------------------------------------------- > > Key: KYLIN-5640 > URL: https://issues.apache.org/jira/browse/KYLIN-5640 > Project: Kylin > Issue Type: Improvement > Components: Query Engine > Affects Versions: 5.0-alpha > Reporter: Zhiting Guo > Assignee: Zhiting Guo > Priority: Major > Fix For: 5.0-beta > > > h3. Why are the changes needed? > Now the usage of bloom filter is to specify the NDV(number of distinct > values), and then build BloomFilter. In general scenarios, it is actually not > sure how much the distinct value is. > If BloomFilter can be automatically generated according to the data, the file > size can be reduced and the reading efficiency can also be improved. > h3. What changes were proposed in this pull request? > {{DynamicBlockBloomFilter}} contains multiple {{BlockSplitBloomFilter}} as > candidates and inserts values in the candidates at the same time. Use the > largest bloom filter as an approximate deduplication counter, and then remove > incapable bloom filter candidates during data insertion. -- This message was sent by Atlassian Jira (v8.20.10#820010)