I spent some time to debug the kylin code and I think it's very likely a bug in
Query module.
The cube and the fact table was illustrated before in the last couple emails.
The query I used was "select id, sum(trans_at), max(trans_at) from max_neg
group by id;".
When the query hit the rest server, it will be passed to the function "private
SQLResponse execute(String sql, SQLRequest sqlRequest)" of "QueryService" class
in kylin-server module.
And in that function, it utilizes the calcite to parse the sql and the
kylin-storage module to actually fetch data from hbase. However, the following
codes of the function packages wrong results.
while (resultSet.next()) {
for (int i = 0; i < columnCount; i++) {
oneRow.add((resultSet.getString(i + 1))); -- This produce
wrong result like [2c9083862fe1d8f00130062167480ce8-213582789, -86.4, 4.9E-324]
}
results.add(new LinkedList<String>(oneRow));
oneRow.clear();
}
I also checked the related functions of kylin-storage, and the results fetched
from hbase were all good, for example, the function "public Tuple next()" of
"CubeSegmentTupleIterator" class returned
[2c9083862fe1d8f00130062167480ce8-0114097017, -59.0, 1, -59.0] and the function
"private Object[] convertCurrentRow(ITuple tuple)" of "CubeEnumerator" class
also returned [2c9083862fe1d8f00130062167480ce8-0114097017, -59.0, 1, -59.0]
which I think both are correct.
Unfortunately I am still looking for the exact piece of codes which cause the
problem, can anyone help look into this issue? Thanks.
Hua
> -----邮件原件-----
> 发件人: Huang Hua [mailto:[email protected]]
> 发送时间: 2015年4月30日 12:10
> 收件人: 'Li Yang'; [email protected]
> 主题: 答复: Max on negative double values is not working
>
> We defined the max(trans_at) as one of the measures of the cube.
>
> our cube json looks like:
> {
> "uuid": "4d57736b-d3e4-40fe-8480-cbc80f78b4ef",
> "name": "test_max_0",
> "description": "",
> "dimensions": [
> {
> "id": 1,
> "name": "ID",
> "join": null,
> "hierarchy": null,
> "table": "TEST_MAX_ON_NEG",
> "column": "ID",
> "datatype": null,
> "derived": null
> }
> ],
> "measures": [
> {
> "id": 1,
> "name": "_COUNT_",
> "function": {
> "expression": "COUNT",
> "parameter": {
> "type": "constant",
> "value": "1"
> },
> "returntype": "bigint"
> },
> "dependent_measure_ref": null
> },
> {
> "id": 2,
> "name": "SUM_TRANS_AT",
> "function": {
> "expression": "SUM",
> "parameter": {
> "type": "column",
> "value": "TRANS_AT"
> },
> "returntype": "double"
> },
> "dependent_measure_ref": null
> },
> {
> "id": 3,
> "name": "MAX_TRANS_AT",
> "function": {
> "expression": "MAX",
> "parameter": {
> "type": "column",
> "value": "TRANS_AT"
> },
> "returntype": "double"
> },
> "dependent_measure_ref": null
> }
> ],
> "rowkey": {
> "rowkey_columns": [
> {
> "column": "ID",
> "length": 0,
> "dictionary": "true",
> "mandatory": false
> }
> ],
> "aggregation_groups": [
> [
> "ID"
> ]
> ]
> },
> "signature": "R/zfoefWoDBHzfzSUh5thA==",
> "capacity": "MEDIUM",
> "last_modified": 1430279551744,
> "fact_table": "TEST_MAX_ON_NEG",
> "null_string": null,
> "filter_condition": null,
> "cube_partition_desc": {
> "partition_date_column": null,
> "partition_date_start": 0,
> "cube_partition_type": "APPEND"
> },
> "hbase_mapping": {
> "column_family": [
> {
> "name": "F1",
> "columns": [
> {
> "qualifier": "M",
> "measure_refs": [
> "_COUNT_",
> "SUM_TRANS_AT",
> "MAX_TRANS_AT"
> ]
> }
> ]
> }
> ]
> },
> "notify_list": []
> }
>
> The cube is based on only a fact table called test_max_on_neg which has
> only six records:
>
> 2c9083862fe1d8f00130062167480ce8-211316617 -199.6
> 2c9083862fe1d8f00130062167480ce8-21782293 -229.0
> 2c9083862fe1d8f00130062167480ce8-211955199 -151.7
> 2c9083862fe1d8f00130062167480ce8-213582789 -86.4
> 2c9083862fe1d8f00130062167480ce8-115552384 -353.5
> 2c9083862fe1d8f00130062167480ce8-0114097017 -59.0
>
> When the cube is ready, we did the query: select id, max(trans_at),
> sum(trans_at) from test_max_on_neg group by id, which returned 4.9E-324
> for MAX(trans_at) column. I think the problem still exists for kylin-0.6.6,
> and
> you can reproduce the problem if you like.
>
> > -----邮件原件-----
> > 发件人: Li Yang [mailto:[email protected]]
> > 发送时间: 2015年4月29日 14:56
> > 收件人: [email protected]
> > 主题: Re: Max on negative double values is not working
> >
> > Depend on whether max(trans_at) is a measure of cube or not, the
> > problem and the answer will be different.
> >
> >
> > On Wed, Apr 29, 2015 at 10:43 AM, hongbin ma <[email protected]>
> > wrote:
> >
> > > can you share your cube's definition? you can go to cubes tab, click
> > > on the cube, and get json tab.
> > >
> > > we'd like to help check the definition of your cube
> > >
> > > On Tue, Apr 28, 2015 at 8:01 PM, Huang Hua
> <[email protected]>
> > > wrote:
> > >
> > > > Hi there,
> > > >
> > > >
> > > >
> > > > We are using kylin 0.6.6, and we found out the max function might
> > > > not
> > > work
> > > > for negative values in double type.
> > > >
> > > >
> > > >
> > > > The fact table in Hive only has two columns: id and trans_at and
> > > > contains six records as shown below:
> > > >
> > > > Id trans_at
> > > >
> > > > 1, -59.0
> > > >
> > > > 2, -229.0
> > > >
> > > > 3, -86.4
> > > >
> > > > 4, -151.7
> > > >
> > > > 5, -199.6
> > > >
> > > > 6, -353.5
> > > >
> > > >
> > > >
> > > > Then based on the fact table, we created and built a cube with 1
> > > dimension
> > > > and 3 measures including the default _count_ one:
> > > >
> > > > Dimension: id
> > > >
> > > > Measures: max(trans_at), sum(trans_at), _count_
> > > >
> > > >
> > > >
> > > > The sql used in query was: select id, MAX(trans_at), SUM(trans_at)
> > > > from the_table group by id, which produced the following output:
> > > >
> > > >
> > > >
> > > > ID,EXPR$1,EXPR$2
> > > >
> > > > 1, 4.9E-324, -59.0
> > > >
> > > > 2, 4.9E-324, -229.0
> > > >
> > > > 3, 4.9E-324, -86.4
> > > >
> > > > 4, 4.9E-324, -151.7
> > > >
> > > > 5, 4.9E-324, -199.6
> > > >
> > > > 6, 4.9E-324, -353.5
> > > >
> > > >
> > > >
> > > > As you can see, max(trans_at) produced 4.9E-324 for each row which
> > > > is not what we expect to see.
> > > >
> > > >
> > > >
> > > > Best Regards.
> > > >
> > > > Hua
> > > >
> > > >
> > >
> > >
> > > --
> > > Regards,
> > >
> > > *Bin Mahone | 马洪宾*
> > > Apache Kylin: http://kylin.io
> > > Github: https://github.com/binmahone
> > >
>
>