hongbin ma created KYLIN-934:
--------------------------------

             Summary: Negative number in SUM result and Kylin results not 
matching exactly Hive results
                 Key: KYLIN-934
                 URL: https://issues.apache.org/jira/browse/KYLIN-934
             Project: Kylin
          Issue Type: Bug
    Affects Versions: v0.7.2
            Reporter: hongbin ma
            Assignee: hongbin ma


I have a simple cube with a Fact Table and a Lookup Table. The Fact Table
includes some counter. The Lookup Table includes a region property which is
the highest level of the hierarchy and therefore can accumulate quite a lot
of data when summing the counter by region.

I am doing a query like this:

select sum(MY_COUNTER) as tot_count, REGION

from FACT_TABLE

join LOOKUP_TABLE

on FACT_TABLE.FK = LOOKUP_TABLE.PK

where date = '2015-01-01'

group by REGION


I get the following result:

-917,164,421 EMEA --> negative result!

777,795,808 AMER

514,879,134 APAC

117,814,485 LATAM

I ran the exact same query in Hive and got the following result:

3,381,591,228  EMEA  --> big number but smaller than the BIGINT limit

778,718,563    AMER  --> 922,755 difference with the Kylin result

520,253,610    APAC   --> 5,374,476 difference with the Kylin result

117,913,857    LATAM --> 99,372 difference with the Kylin result

Based on that it seems that the limit is the int limit 2^31-1
(2,147,483,647) and that my negative result comes from an overflow.

In my cube I defined this measure as a bigint:

Expression    Param Type    Param Value       Return Type

SUM              column            AVAIL_COUNT    bigint

The other thing that worries me a bit is that the other numbers are not
100% accurate either (>100,000,000 difference!)


===================================================
reported by alex schufo  [email protected]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to