Re: Negative number in SUM result and Kylin results not matching exactly Hive results

Luke Han Wed, 05 Aug 2015 08:11:00 -0700

What's the data type in your hive table of that column?
We recommend to use same as your measure definition.


For data quality, we have approach to ensure it will 100%
match with source data, there's H2 comparison in our testing case.
But yes, let's double check the test case about huge number like your case

Is it possible to share your data and model to us if not sensitive?
It will be better to using that to do the verify and testing here.

Thanks.










Best Regards!
---------------------

Luke Han

On Wed, Aug 5, 2015 at 10:57 PM, alex schufo <[email protected]> wrote:

> I have a simple cube with a Fact Table and a Lookup Table. The Fact Table
> includes some counter. The Lookup Table includes a region property which is
> the highest level of the hierarchy and therefore can accumulate quite a lot
> of data when summing the counter by region.
>
> I am doing a query like this:
>
> select sum(MY_COUNTER) as tot_count, REGION
>
> from FACT_TABLE
>
> join LOOKUP_TABLE
>
> on FACT_TABLE.FK = LOOKUP_TABLE.PK
>
> where date = '2015-01-01'
>
> group by REGION
>
>
> I get the following result:
>
> -917,164,421 EMEA --> negative result!
>
> 777,795,808 AMER
>
> 514,879,134 APAC
>
> 117,814,485 LATAM
>
> I ran the exact same query in Hive and got the following result:
>
> 3,381,591,228  EMEA  --> big number but smaller than the BIGINT limit
>
> 778,718,563    AMER  --> 922,755 difference with the Kylin result
>
> 520,253,610    APAC   --> 5,374,476 difference with the Kylin result
>
> 117,913,857    LATAM --> 99,372 difference with the Kylin result
>
> Based on that it seems that the limit is the int limit 2^31-1
> (2,147,483,647) and that my negative result comes from an overflow.
>
> In my cube I defined this measure as a bigint:
>
> Expression    Param Type    Param Value       Return Type
>
> SUM              column            AVAIL_COUNT    bigint
>
> The other thing that worries me a bit is that the other numbers are not
> 100% accurate either (>100,000,000 difference!)
>

Re: Negative number in SUM result and Kylin results not matching exactly Hive results

Reply via email to