[jira] [Updated] (KYLIN-2617) SUM when rewritten as TOPN does not give consistent (correct) result?

Shaofeng SHI (JIRA) Mon, 05 Feb 2018 01:23:18 -0800

     [ 
https://issues.apache.org/jira/browse/KYLIN-2617?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Shaofeng SHI updated KYLIN-2617:
--------------------------------
    Component/s: Measure - TopN

> SUM when rewritten as TOPN does not give consistent (correct) result?
> ---------------------------------------------------------------------
>
>                 Key: KYLIN-2617
>                 URL: https://issues.apache.org/jira/browse/KYLIN-2617
>             Project: Kylin
>          Issue Type: Bug
>          Components: Measure - TopN
>            Reporter: liyang
>            Priority: Major
>
> From Tingmao Lin <[email protected]>
> We found that SUM() query on a cardinality 1 dimension is not accurate (or 
> "not correct") when automatically  rewritten as TOPN.
> Is that the expected behavior of kylin or there are any other issue?
> We built a cube on a table ( measure1: bigint, dim1_id:varchar, 
> dim2_id:varchar, ... ) using kylin 1.6.0 (Kafka streaming source)
> The cube has two measures: SUM(measure1) and 
> TOPN(10,sum-orderby(measure1),group by dim2_id) . (other measures omitted)
> and two dimensions  dim1_id, dim2_id   (other dims omitted)
> About the source table data:  
> The cardinality of dim1_id  is 1 (same dim1_id for all rows in the source 
> table)
> The cardinality of dim2_id  is 1 (same dim2_id for all rows in the source 
> table)
> The possible value of measure1 is [1,0,-1]
>  
> When we query
>     "select SUM(measure1) FROM table GROUP BY dim2_id"                    =>  
>    the result has one row:"sum=7",
>       from the kylin logs we found that the query has been automatically  
> rewritten as TOPN(measure1,sum-orderby(measure1),group by dim2_id)
> When we write another query to prevent TOPN rewrite, for example: 
>    "select SUM(measure1),count(*) FROM table GROUP BY dim2_id"     =>   one 
> row -- "sum=-2,count=24576"
>    "select SUM(measure1),count(*) FROM table"                                 
>        =>   one row -- "sum=-2,count=24576"
> The result is different (7 and -2) when rewritting to TOPN or not.
> My question is: are the following behavior "works as expected" ,or TOPN 
> algorithm does not support negative counter values very well , or any issue 
> there?
> 1. SUM() query  automatically rewritten as TOPN and gives approximated result 
> when no TOPN present in the query.
> 2. When cardinality is 1, TOPN does not give accurate result.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Updated] (KYLIN-2617) SUM when rewritten as TOPN does not give consistent (correct) result?

Reply via email to