[
https://issues.apache.org/jira/browse/KYLIN-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16781854#comment-16781854
]
KANG-SEN LU commented on KYLIN-2620:
------------------------------------
If we have TOPN(SUM(X), GROUP-BY D1) metric configured in a kylin cube, the
query in hand must meet the following conditions:
# GROUP-BY list includes D1 dimension,
# ORDER-BY SUM(X)
# LIMIT n, where n <= TOPN's limit.
Condition 2 and 3 are mentioned by the bug description. But about point 1, I
think it is important. We don't want the kylin to use TOPN(SUM(X), GROUP-BY D1)
in case the query did not have GROUP-BY D1. If kylin rewrite SUM(X) to
TOPN(SUM(X)), then it would have to aggregate over all D1 values. That may lost
accuracy, if kylin did not save all D1 value in its cuboid.
> Check for "ORDER BY LIMIT" clause when rewrite SUM query as TOPN
> ----------------------------------------------------------------
>
> Key: KYLIN-2620
> URL: https://issues.apache.org/jira/browse/KYLIN-2620
> Project: Kylin
> Issue Type: Bug
> Components: Measure - TopN
> Reporter: Lin Tingmao
> Assignee: Chao Long
> Priority: Major
> Fix For: v2.6.2
>
>
> When running the following query
> select sum(measure) from table group by col_id
> if there exists TOPN(measure, group by col_id) measure,
> TopNMeasureType.isTopNCompatibleSum() will pass, so the SUM is rewritten
> to TOPN. This confuses the user since they may expect a accurate result for
> every distinct value of group by column(s).
> Kylin should check if "ORDER BY col_id LIMIT topncapacity" is present in the
> query to determine whether to rewrite.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)