I spent some time to debug the kylin code and I think it's very likely a bug in 
Query module.

The cube and the fact table was illustrated before in the last couple emails. 
The query I used was "select id, sum(trans_at), max(trans_at) from max_neg 
group by id;".
When the query hit the rest server, it will be passed to the function "private 
SQLResponse execute(String sql, SQLRequest sqlRequest)" of "QueryService" class 
in kylin-server module.
And in that function, it utilizes the calcite to parse the sql and the 
kylin-storage module to actually fetch data from hbase. However, the following 
codes of the function packages wrong results.
            while (resultSet.next()) {
                for (int i = 0; i < columnCount; i++) {
                    oneRow.add((resultSet.getString(i + 1))); -- This produce 
wrong result like [2c9083862fe1d8f00130062167480ce8-213582789, -86.4, 4.9E-324]
                }

                results.add(new LinkedList<String>(oneRow));
                oneRow.clear();
            }

I also checked the related functions of kylin-storage, and the results fetched 
from hbase were all good, for example, the function "public Tuple next()" of 
"CubeSegmentTupleIterator" class returned 
[2c9083862fe1d8f00130062167480ce8-0114097017, -59.0, 1, -59.0] and the function 
"private Object[] convertCurrentRow(ITuple tuple)" of "CubeEnumerator" class 
also returned [2c9083862fe1d8f00130062167480ce8-0114097017, -59.0, 1, -59.0] 
which I think both are correct. 

Unfortunately I am still looking for the exact piece of codes which cause the 
problem, can anyone help look into this issue? Thanks.

Hua

> -----邮件原件-----
> 发件人: Huang Hua [mailto:[email protected]]
> 发送时间: 2015年4月30日 12:10
> 收件人: 'Li Yang'; [email protected]
> 主题: 答复: Max on negative double values is not working
> 
> We defined the max(trans_at) as one of the measures of the cube.
> 
> our cube json looks like:
> {
>   "uuid": "4d57736b-d3e4-40fe-8480-cbc80f78b4ef",
>   "name": "test_max_0",
>   "description": "",
>   "dimensions": [
>     {
>       "id": 1,
>       "name": "ID",
>       "join": null,
>       "hierarchy": null,
>       "table": "TEST_MAX_ON_NEG",
>       "column": "ID",
>       "datatype": null,
>       "derived": null
>     }
>   ],
>   "measures": [
>     {
>       "id": 1,
>       "name": "_COUNT_",
>       "function": {
>         "expression": "COUNT",
>         "parameter": {
>           "type": "constant",
>           "value": "1"
>         },
>         "returntype": "bigint"
>       },
>       "dependent_measure_ref": null
>     },
>     {
>       "id": 2,
>       "name": "SUM_TRANS_AT",
>       "function": {
>         "expression": "SUM",
>         "parameter": {
>           "type": "column",
>           "value": "TRANS_AT"
>         },
>         "returntype": "double"
>       },
>       "dependent_measure_ref": null
>     },
>     {
>       "id": 3,
>       "name": "MAX_TRANS_AT",
>       "function": {
>         "expression": "MAX",
>         "parameter": {
>           "type": "column",
>           "value": "TRANS_AT"
>         },
>         "returntype": "double"
>       },
>       "dependent_measure_ref": null
>     }
>   ],
>   "rowkey": {
>     "rowkey_columns": [
>       {
>         "column": "ID",
>         "length": 0,
>         "dictionary": "true",
>         "mandatory": false
>       }
>     ],
>     "aggregation_groups": [
>       [
>         "ID"
>       ]
>     ]
>   },
>   "signature": "R/zfoefWoDBHzfzSUh5thA==",
>   "capacity": "MEDIUM",
>   "last_modified": 1430279551744,
>   "fact_table": "TEST_MAX_ON_NEG",
>   "null_string": null,
>   "filter_condition": null,
>   "cube_partition_desc": {
>     "partition_date_column": null,
>     "partition_date_start": 0,
>     "cube_partition_type": "APPEND"
>   },
>   "hbase_mapping": {
>     "column_family": [
>       {
>         "name": "F1",
>         "columns": [
>           {
>             "qualifier": "M",
>             "measure_refs": [
>               "_COUNT_",
>               "SUM_TRANS_AT",
>               "MAX_TRANS_AT"
>             ]
>           }
>         ]
>       }
>     ]
>   },
>   "notify_list": []
> }
> 
> The cube is based on only a fact table called test_max_on_neg which has
> only six records:
> 
> 2c9083862fe1d8f00130062167480ce8-211316617        -199.6
> 2c9083862fe1d8f00130062167480ce8-21782293 -229.0
> 2c9083862fe1d8f00130062167480ce8-211955199        -151.7
> 2c9083862fe1d8f00130062167480ce8-213582789        -86.4
> 2c9083862fe1d8f00130062167480ce8-115552384        -353.5
> 2c9083862fe1d8f00130062167480ce8-0114097017      -59.0
> 
> When the cube is ready, we did the query: select id, max(trans_at),
> sum(trans_at) from test_max_on_neg group by id, which returned 4.9E-324
> for MAX(trans_at) column. I think the problem still exists for kylin-0.6.6, 
> and
> you can reproduce the problem if you like.
> 
> > -----邮件原件-----
> > 发件人: Li Yang [mailto:[email protected]]
> > 发送时间: 2015年4月29日 14:56
> > 收件人: [email protected]
> > 主题: Re: Max on negative double values is not working
> >
> > Depend on whether max(trans_at) is a measure of cube or not, the
> > problem and the answer will be different.
> >
> >
> > On Wed, Apr 29, 2015 at 10:43 AM, hongbin ma <[email protected]>
> > wrote:
> >
> > > can you share your cube's definition? you can go to cubes tab, click
> > > on the cube, and get json tab.
> > >
> > > we'd like to help check the definition of your cube
> > >
> > > On Tue, Apr 28, 2015 at 8:01 PM, Huang Hua
> <[email protected]>
> > > wrote:
> > >
> > > > Hi there,
> > > >
> > > >
> > > >
> > > > We are using kylin 0.6.6, and we found out the max function might
> > > > not
> > > work
> > > > for negative values in double type.
> > > >
> > > >
> > > >
> > > > The fact table in Hive only has two columns: id and trans_at and
> > > > contains six records as shown below:
> > > >
> > > > Id trans_at
> > > >
> > > > 1, -59.0
> > > >
> > > > 2, -229.0
> > > >
> > > > 3, -86.4
> > > >
> > > > 4, -151.7
> > > >
> > > > 5, -199.6
> > > >
> > > > 6, -353.5
> > > >
> > > >
> > > >
> > > > Then based on the fact table, we created and built a cube with 1
> > > dimension
> > > > and 3 measures including the default _count_ one:
> > > >
> > > > Dimension: id
> > > >
> > > > Measures: max(trans_at), sum(trans_at), _count_
> > > >
> > > >
> > > >
> > > > The sql used in query was: select id, MAX(trans_at), SUM(trans_at)
> > > > from the_table group by id, which produced the following output:
> > > >
> > > >
> > > >
> > > > ID,EXPR$1,EXPR$2
> > > >
> > > > 1, 4.9E-324, -59.0
> > > >
> > > > 2, 4.9E-324, -229.0
> > > >
> > > > 3, 4.9E-324, -86.4
> > > >
> > > > 4, 4.9E-324, -151.7
> > > >
> > > > 5, 4.9E-324, -199.6
> > > >
> > > > 6, 4.9E-324, -353.5
> > > >
> > > >
> > > >
> > > > As you can see, max(trans_at) produced 4.9E-324 for each row which
> > > > is not what we expect to see.
> > > >
> > > >
> > > >
> > > > Best Regards.
> > > >
> > > > Hua
> > > >
> > > >
> > >
> > >
> > > --
> > > Regards,
> > >
> > > *Bin Mahone | 马洪宾*
> > > Apache Kylin: http://kylin.io
> > > Github: https://github.com/binmahone
> > >
> 
> 



Reply via email to