[
https://issues.apache.org/jira/browse/KYLIN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shaofeng SHI updated KYLIN-2617:
--------------------------------
Component/s: Measure - TopN
> SUM when rewritten as TOPN does not give consistent (correct) result?
> ---------------------------------------------------------------------
>
> Key: KYLIN-2617
> URL: https://issues.apache.org/jira/browse/KYLIN-2617
> Project: Kylin
> Issue Type: Bug
> Components: Measure - TopN
> Reporter: liyang
> Priority: Major
>
> From Tingmao Lin <[email protected]>
> We found that SUM() query on a cardinality 1 dimension is not accurate (or
> "not correct") when automatically rewritten as TOPN.
> Is that the expected behavior of kylin or there are any other issue?
> We built a cube on a table ( measure1: bigint, dim1_id:varchar,
> dim2_id:varchar, ... ) using kylin 1.6.0 (Kafka streaming source)
> The cube has two measures: SUM(measure1) and
> TOPN(10,sum-orderby(measure1),group by dim2_id) . (other measures omitted)
> and two dimensions dim1_id, dim2_id (other dims omitted)
> About the source table data:
> The cardinality of dim1_id is 1 (same dim1_id for all rows in the source
> table)
> The cardinality of dim2_id is 1 (same dim2_id for all rows in the source
> table)
> The possible value of measure1 is [1,0,-1]
>
> When we query
> "select SUM(measure1) FROM table GROUP BY dim2_id" =>
> the result has one row:"sum=7",
> from the kylin logs we found that the query has been automatically
> rewritten as TOPN(measure1,sum-orderby(measure1),group by dim2_id)
> When we write another query to prevent TOPN rewrite, for example:
> "select SUM(measure1),count(*) FROM table GROUP BY dim2_id" => one
> row -- "sum=-2,count=24576"
> "select SUM(measure1),count(*) FROM table"
> => one row -- "sum=-2,count=24576"
> The result is different (7 and -2) when rewritting to TOPN or not.
> My question is: are the following behavior "works as expected" ,or TOPN
> algorithm does not support negative counter values very well , or any issue
> there?
> 1. SUM() query automatically rewritten as TOPN and gives approximated result
> when no TOPN present in the query.
> 2. When cardinality is 1, TOPN does not give accurate result.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)