Negative number in SUM result and Kylin results not matching exactly Hive results

alex schufo Wed, 05 Aug 2015 07:59:39 -0700

I have a simple cube with a Fact Table and a Lookup Table. The Fact Table
includes some counter. The Lookup Table includes a region property which is
the highest level of the hierarchy and therefore can accumulate quite a lot
of data when summing the counter by region.


I am doing a query like this:

select sum(MY_COUNTER) as tot_count, REGION

from FACT_TABLE

join LOOKUP_TABLE

on FACT_TABLE.FK = LOOKUP_TABLE.PK

where date = '2015-01-01'

group by REGION


I get the following result:

-917,164,421 EMEA --> negative result!

777,795,808 AMER

514,879,134 APAC

117,814,485 LATAM

I ran the exact same query in Hive and got the following result:

3,381,591,228  EMEA  --> big number but smaller than the BIGINT limit

778,718,563    AMER  --> 922,755 difference with the Kylin result

520,253,610    APAC   --> 5,374,476 difference with the Kylin result

117,913,857    LATAM --> 99,372 difference with the Kylin result

Based on that it seems that the limit is the int limit 2^31-1
(2,147,483,647) and that my negative result comes from an overflow.

In my cube I defined this measure as a bigint:

Expression    Param Type    Param Value       Return Type

SUM              column            AVAIL_COUNT    bigint

The other thing that worries me a bit is that the other numbers are not
100% accurate either (>100,000,000 difference!)

Negative number in SUM result and Kylin results not matching exactly Hive results

Reply via email to