[ 
https://issues.apache.org/jira/browse/KYLIN-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16781854#comment-16781854
 ] 

KANG-SEN LU commented on KYLIN-2620:
------------------------------------

If we have TOPN(SUM(X), GROUP-BY D1) metric configured in a kylin cube, the 
query in hand must meet the following conditions:
 # GROUP-BY list includes D1 dimension,
 # ORDER-BY SUM(X)
 # LIMIT n,   where n <= TOPN's limit.

Condition 2 and 3 are mentioned by the bug description. But about point 1, I 
think it is important. We don't want the kylin to use TOPN(SUM(X), GROUP-BY D1) 
in case the query did not have GROUP-BY D1. If kylin rewrite SUM(X) to 
TOPN(SUM(X)), then it would have to aggregate over all D1 values. That may lost 
accuracy, if kylin did not save all D1 value in its cuboid.

> Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN
> ----------------------------------------------------------------
>
>                 Key: KYLIN-2620
>                 URL: https://issues.apache.org/jira/browse/KYLIN-2620
>             Project: Kylin
>          Issue Type: Bug
>          Components: Measure - TopN
>            Reporter: Lin Tingmao
>            Assignee: Chao Long
>            Priority: Major
>             Fix For: v2.6.2
>
>
> When running the following query
> select sum(measure) from table group by col_id
> if there exists TOPN(measure, group by col_id)  measure, 
> TopNMeasureType.isTopNCompatibleSum()    will pass, so the SUM is rewritten 
> to TOPN. This confuses the user since they may expect a accurate result for 
> every distinct value of group by column(s). 
> Kylin should check if "ORDER BY col_id LIMIT topncapacity" is present in the 
> query to determine whether to rewrite.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to