hongbin ma created KYLIN-934:
--------------------------------
Summary: Negative number in SUM result and Kylin results not
matching exactly Hive results
Key: KYLIN-934
URL: https://issues.apache.org/jira/browse/KYLIN-934
Project: Kylin
Issue Type: Bug
Affects Versions: v0.7.2
Reporter: hongbin ma
Assignee: hongbin ma
I have a simple cube with a Fact Table and a Lookup Table. The Fact Table
includes some counter. The Lookup Table includes a region property which is
the highest level of the hierarchy and therefore can accumulate quite a lot
of data when summing the counter by region.
I am doing a query like this:
select sum(MY_COUNTER) as tot_count, REGION
from FACT_TABLE
join LOOKUP_TABLE
on FACT_TABLE.FK = LOOKUP_TABLE.PK
where date = '2015-01-01'
group by REGION
I get the following result:
-917,164,421 EMEA --> negative result!
777,795,808 AMER
514,879,134 APAC
117,814,485 LATAM
I ran the exact same query in Hive and got the following result:
3,381,591,228 EMEA --> big number but smaller than the BIGINT limit
778,718,563 AMER --> 922,755 difference with the Kylin result
520,253,610 APAC --> 5,374,476 difference with the Kylin result
117,913,857 LATAM --> 99,372 difference with the Kylin result
Based on that it seems that the limit is the int limit 2^31-1
(2,147,483,647) and that my negative result comes from an overflow.
In my cube I defined this measure as a bigint:
Expression Param Type Param Value Return Type
SUM column AVAIL_COUNT bigint
The other thing that worries me a bit is that the other numbers are not
100% accurate either (>100,000,000 difference!)
===================================================
reported by alex schufo [email protected]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)