[jira] [Updated] (KYLIN-5564) Introduce Bloom Filter to optimize data scanning based on Spark

Guangyuan Feng (Jira) Tue, 13 Jun 2023 00:33:05 -0700


     [ 
https://issues.apache.org/jira/browse/KYLIN-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Guangyuan Feng updated KYLIN-5564:
----------------------------------
    Fix Version/s: 5.0-beta
                       (was: 5.0-alpha)

> Introduce Bloom Filter to optimize data scanning based on Spark
> ---------------------------------------------------------------
>
>                 Key: KYLIN-5564
>                 URL: https://issues.apache.org/jira/browse/KYLIN-5564
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Query Engine
>    Affects Versions: 5.0-alpha
>            Reporter: Guangyuan Feng
>            Assignee: Guangyuan Feng
>            Priority: Major
>             Fix For: 5.0-beta
>
>         Attachments: RowGroup BloomFilter 场景介绍和性能测试.pdf
>
>
> Currently, all the data generated by Kylin are saved as *Parquet* files 
> through Spark, but Kylin has not make full use of the features of Parquet 
> when scanning data. Among them, BloomFilter must be stressed, because it's 
> the most common tool to help *READERs* to skip useless data.
> Therefore, we introduced an approach to build *BloomFilter* automatically, 
> conditionally and smartly when constructing segments, on the desired columns 
> especially according to the query histories.
> After brought in BloomFilter, Spark will have a good performance improvement 
> in the most cases.
>  
> _About the benchmarks or performance tests, please read the attached PDF is 
> the report testing on SSB._
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KYLIN-5564) Introduce Bloom Filter to optimize data scanning based on Spark

Reply via email to