Re: Negative number in SUM result and Kylin results not matching exactly Hive results

Luke Han Wed, 05 Aug 2015 08:52:19 -0700

That's fine, please change your hive schema first to make sure they are
same,
then try again.


Thanks.



Best Regards!
---------------------

Luke Han

On Wed, Aug 5, 2015 at 11:29 PM, alex schufo <[email protected]> wrote:

> I am using 0.7.2.
>
> The Hive column is int, does that mean I should try to modify the existing
> Hive table to move to Bigint as well to have the Kylin count as a Bigint?
>
> I probably won't be allowed to share this kind of data outside my
> organization unfortunately.
>
> On Wed, Aug 5, 2015 at 4:22 PM, hongbin ma <[email protected]> wrote:
>
> > except for what Luke asked, can you also share which version are you
> using?
> > We'll look into this
> >
> > On Wed, Aug 5, 2015 at 11:08 PM, Luke Han <[email protected]> wrote:
> >
> > > What's the data type in your hive table of that column?
> > > We recommend to use same as your measure definition.
> > >
> > > For data quality, we have approach to ensure it will 100%
> > > match with source data, there's H2 comparison in our testing case.
> > > But yes, let's double check the test case about huge number like your
> > case
> > >
> > > Is it possible to share your data and model to us if not sensitive?
> > > It will be better to using that to do the verify and testing here.
> > >
> > > Thanks.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > Best Regards!
> > > ---------------------
> > >
> > > Luke Han
> > >
> > > On Wed, Aug 5, 2015 at 10:57 PM, alex schufo <[email protected]>
> > wrote:
> > >
> > > > I have a simple cube with a Fact Table and a Lookup Table. The Fact
> > Table
> > > > includes some counter. The Lookup Table includes a region property
> > which
> > > is
> > > > the highest level of the hierarchy and therefore can accumulate
> quite a
> > > lot
> > > > of data when summing the counter by region.
> > > >
> > > > I am doing a query like this:
> > > >
> > > > select sum(MY_COUNTER) as tot_count, REGION
> > > >
> > > > from FACT_TABLE
> > > >
> > > > join LOOKUP_TABLE
> > > >
> > > > on FACT_TABLE.FK = LOOKUP_TABLE.PK
> > > >
> > > > where date = '2015-01-01'
> > > >
> > > > group by REGION
> > > >
> > > >
> > > > I get the following result:
> > > >
> > > > -917,164,421 EMEA --> negative result!
> > > >
> > > > 777,795,808 AMER
> > > >
> > > > 514,879,134 APAC
> > > >
> > > > 117,814,485 LATAM
> > > >
> > > > I ran the exact same query in Hive and got the following result:
> > > >
> > > > 3,381,591,228  EMEA  --> big number but smaller than the BIGINT limit
> > > >
> > > > 778,718,563    AMER  --> 922,755 difference with the Kylin result
> > > >
> > > > 520,253,610    APAC   --> 5,374,476 difference with the Kylin result
> > > >
> > > > 117,913,857    LATAM --> 99,372 difference with the Kylin result
> > > >
> > > > Based on that it seems that the limit is the int limit 2^31-1
> > > > (2,147,483,647) and that my negative result comes from an overflow.
> > > >
> > > > In my cube I defined this measure as a bigint:
> > > >
> > > > Expression    Param Type    Param Value       Return Type
> > > >
> > > > SUM              column            AVAIL_COUNT    bigint
> > > >
> > > > The other thing that worries me a bit is that the other numbers are
> not
> > > > 100% accurate either (>100,000,000 difference!)
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > *Bin Mahone | 马洪宾*
> > Apache Kylin: http://kylin.io
> > Github: https://github.com/binmahone
> >
>

Re: Negative number in SUM result and Kylin results not matching exactly Hive results

Reply via email to