[
https://issues.apache.org/jira/browse/KYLIN-5787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17835371#comment-17835371
]
pengfei.zhan edited comment on KYLIN-5787 at 4/9/24 12:17 PM:
--------------------------------------------------------------
h1. The old behavior
|| ||*percentile*||*percentile_approx*||
|Precomputation|t-digest|t-digest|
|runtime computation|QuantileSummaries|QuantileSummaries|
|pushdown / spark-sql|Sort and take the exact value|QuantileSummaries|
h1. Design
Add configuration "kylin.query.percentile-approx-algorithm", default value is
null, keep current behavior unchanged by default, project level setting is not
supported, restart KYLIN to make it work.
Configure the optional value "t-digest", the configured behavior is as follows
|| ||*percentile*||*percentile_approx*||
|Precomputation|t-digest|t-digest|
|runtime computation|t-digest|t-digest|
|pushdown|Sort and take the exact value|t-digest|
|spark-sql|Sort and take the exact value|QuantileSummaries|
runtime computation means need extra aggregation on the layout(also called
cuboid).
More info please refer to:
https://cn.kyligence.io/resources/kyligence-public-seminar-190403/
was (Author: JIRAUSER294653):
h1. The old behavior
|| ||*percentile*||*percentile_approx*||
|Precomputation|t-digest|t-digest|
|runtime computation|QuantileSummaries|QuantileSummaries|
|pushdown / spark-sql|Sort and take the exact value|QuantileSummaries|
h1. Design
Add configuration "kylin.query.percentile-approx-algorithm", default value is
null, keep current behavior unchanged by default, project level setting is not
supported, restart KYLIN to make it work.
Configure the optional value "t-digest", the configured behavior is as follows
|| ||*percentile*||*percentile_approx*||
|Precomputation|t-digest|t-digest|
|runtime computation|t-digest|t-digest|
|pushdown|Sort and take the exact value|t-digest|
|spark-sql|Sort and take the exact value|QuantileSummaries|
runtime computation means need extra aggregation on the layout(also called
cuboid).
> Use t-digest as spark percentile_approx function
> ------------------------------------------------
>
> Key: KYLIN-5787
> URL: https://issues.apache.org/jira/browse/KYLIN-5787
> Project: Kylin
> Issue Type: Improvement
> Components: Job Engine, Query Engine
> Affects Versions: 5.0-beta
> Reporter: pengfei.zhan
> Assignee: pengfei.zhan
> Priority: Critical
> Fix For: 5.0-beta
>
>
> The underlying implementation of the percentile_approx function in KYLIN is
> the open-source t-digest.
> The underlying implementation of the percentile_approx function in spark is
> spark's own PercentileDigest (based on QuantileSummaries).
> Different implementations lead to different results.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)