[
https://issues.apache.org/jira/browse/KYLIN-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Guangyuan Feng updated KYLIN-5564:
----------------------------------
Fix Version/s: 5.0-beta
(was: 5.0-alpha)
> Introduce Bloom Filter to optimize data scanning based on Spark
> ---------------------------------------------------------------
>
> Key: KYLIN-5564
> URL: https://issues.apache.org/jira/browse/KYLIN-5564
> Project: Kylin
> Issue Type: Improvement
> Components: Query Engine
> Affects Versions: 5.0-alpha
> Reporter: Guangyuan Feng
> Assignee: Guangyuan Feng
> Priority: Major
> Fix For: 5.0-beta
>
> Attachments: RowGroup BloomFilter 场景介绍和性能测试.pdf
>
>
> Currently, all the data generated by Kylin are saved as *Parquet* files
> through Spark, but Kylin has not make full use of the features of Parquet
> when scanning data. Among them, BloomFilter must be stressed, because it's
> the most common tool to help *READERs* to skip useless data.
> Therefore, we introduced an approach to build *BloomFilter* automatically,
> conditionally and smartly when constructing segments, on the desired columns
> especially according to the query histories.
> After brought in BloomFilter, Spark will have a good performance improvement
> in the most cases.
>
> _About the benchmarks or performance tests, please read the attached PDF is
> the report testing on SSB._
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)