[ 
https://issues.apache.org/jira/browse/KYLIN-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dayue Gao resolved KYLIN-2438.
------------------------------
       Resolution: Fixed
    Fix Version/s: v2.0.0

commit 
https://github.com/apache/kylin/commit/09a086688a664585c57b715046a9869b75351a52

> replace scan threshold with max scan bytes
> ------------------------------------------
>
>                 Key: KYLIN-2438
>                 URL: https://issues.apache.org/jira/browse/KYLIN-2438
>             Project: Kylin
>          Issue Type: Improvement
>          Components: Query Engine, Storage - HBase
>    Affects Versions: v1.6.0
>            Reporter: Dayue Gao
>            Assignee: Dayue Gao
>             Fix For: v2.0.0
>
>
> In order to guard against bad queries that can consume lots of memory and 
> potentially crash kylin / hbase server, kylin limits the maximum number of 
> rows query can scan. The maximum value is chosen based on two configs
> # *kylin.query.scan.threshold* is used if the query doesn't contain 
> memory-hungry metrics
> # *kylin.query.mem.budget* / estimated_row_size is used otherwise as the per 
> region maximum.
> This approach however has several deficiencies:
> * It doesn't work with complex, varlen metrics very well. The estimated 
> threshold could be either too small or too large. If it's too small, good 
> queries are killed. If it's too large, bad queries are not banned.
> * Row count doesn't correspond to memory consumption, thus it's difficult to 
> determine how large scan threshold should be set to.
> * kylin.query.scan.threshold can't be override at cube level.
> In this JIRA, I propose to replace the current row count based threshold with 
> a more intuitive size based threshold
> * KYLIN-2437 will collect the number of bytes scanned at both region and 
> query level
> * A new configuration *kylin.query.max-scan-bytes* will be added to limits 
> the maximum number of bytes query can scan
> * *kylin.query.mem.budget* will be renamed to 
> *kylin.storage.hbase.coprocessor-max-scan-bytes*, which limits at region 
> level. No need to rely on estimations about row size any more.
> * The above two configs scan be override at cube level
> * the old *kylin.query.scan.threshold* will be deprecated



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to