[
https://issues.apache.org/jira/browse/KYLIN-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dayue Gao resolved KYLIN-2438.
------------------------------
Resolution: Fixed
Fix Version/s: v2.0.0
commit
https://github.com/apache/kylin/commit/09a086688a664585c57b715046a9869b75351a52
> replace scan threshold with max scan bytes
> ------------------------------------------
>
> Key: KYLIN-2438
> URL: https://issues.apache.org/jira/browse/KYLIN-2438
> Project: Kylin
> Issue Type: Improvement
> Components: Query Engine, Storage - HBase
> Affects Versions: v1.6.0
> Reporter: Dayue Gao
> Assignee: Dayue Gao
> Fix For: v2.0.0
>
>
> In order to guard against bad queries that can consume lots of memory and
> potentially crash kylin / hbase server, kylin limits the maximum number of
> rows query can scan. The maximum value is chosen based on two configs
> # *kylin.query.scan.threshold* is used if the query doesn't contain
> memory-hungry metrics
> # *kylin.query.mem.budget* / estimated_row_size is used otherwise as the per
> region maximum.
> This approach however has several deficiencies:
> * It doesn't work with complex, varlen metrics very well. The estimated
> threshold could be either too small or too large. If it's too small, good
> queries are killed. If it's too large, bad queries are not banned.
> * Row count doesn't correspond to memory consumption, thus it's difficult to
> determine how large scan threshold should be set to.
> * kylin.query.scan.threshold can't be override at cube level.
> In this JIRA, I propose to replace the current row count based threshold with
> a more intuitive size based threshold
> * KYLIN-2437 will collect the number of bytes scanned at both region and
> query level
> * A new configuration *kylin.query.max-scan-bytes* will be added to limits
> the maximum number of bytes query can scan
> * *kylin.query.mem.budget* will be renamed to
> *kylin.storage.hbase.coprocessor-max-scan-bytes*, which limits at region
> level. No need to rely on estimations about row size any more.
> * The above two configs scan be override at cube level
> * the old *kylin.query.scan.threshold* will be deprecated
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)