Re: Negative number in SUM result and Kylin results not matching exactly Hive results

hongbin ma Wed, 05 Aug 2015 18:24:06 -0700

I'll try to reproduce the issue in my own environment

On Wed, Aug 5, 2015 at 11:51 PM, Luke Han <[email protected]> wrote:


> That's fine, please change your hive schema first to make sure they are
> same,
> then try again.
>
> Thanks.
>
>
>
> Best Regards!
> ---------------------
>
> Luke Han
>
> On Wed, Aug 5, 2015 at 11:29 PM, alex schufo <[email protected]> wrote:
>
> > I am using 0.7.2.
> >
> > The Hive column is int, does that mean I should try to modify the
> existing
> > Hive table to move to Bigint as well to have the Kylin count as a Bigint?
> >
> > I probably won't be allowed to share this kind of data outside my
> > organization unfortunately.
> >
> > On Wed, Aug 5, 2015 at 4:22 PM, hongbin ma <[email protected]> wrote:
> >
> > > except for what Luke asked, can you also share which version are you
> > using?
> > > We'll look into this
> > >
> > > On Wed, Aug 5, 2015 at 11:08 PM, Luke Han <[email protected]> wrote:
> > >
> > > > What's the data type in your hive table of that column?
> > > > We recommend to use same as your measure definition.
> > > >
> > > > For data quality, we have approach to ensure it will 100%
> > > > match with source data, there's H2 comparison in our testing case.
> > > > But yes, let's double check the test case about huge number like your
> > > case
> > > >
> > > > Is it possible to share your data and model to us if not sensitive?
> > > > It will be better to using that to do the verify and testing here.
> > > >
> > > > Thanks.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Best Regards!
> > > > ---------------------
> > > >
> > > > Luke Han
> > > >
> > > > On Wed, Aug 5, 2015 at 10:57 PM, alex schufo <[email protected]>
> > > wrote:
> > > >
> > > > > I have a simple cube with a Fact Table and a Lookup Table. The Fact
> > > Table
> > > > > includes some counter. The Lookup Table includes a region property
> > > which
> > > > is
> > > > > the highest level of the hierarchy and therefore can accumulate
> > quite a
> > > > lot
> > > > > of data when summing the counter by region.
> > > > >
> > > > > I am doing a query like this:
> > > > >
> > > > > select sum(MY_COUNTER) as tot_count, REGION
> > > > >
> > > > > from FACT_TABLE
> > > > >
> > > > > join LOOKUP_TABLE
> > > > >
> > > > > on FACT_TABLE.FK = LOOKUP_TABLE.PK
> > > > >
> > > > > where date = '2015-01-01'
> > > > >
> > > > > group by REGION
> > > > >
> > > > >
> > > > > I get the following result:
> > > > >
> > > > > -917,164,421 EMEA --> negative result!
> > > > >
> > > > > 777,795,808 AMER
> > > > >
> > > > > 514,879,134 APAC
> > > > >
> > > > > 117,814,485 LATAM
> > > > >
> > > > > I ran the exact same query in Hive and got the following result:
> > > > >
> > > > > 3,381,591,228  EMEA  --> big number but smaller than the BIGINT
> limit
> > > > >
> > > > > 778,718,563    AMER  --> 922,755 difference with the Kylin result
> > > > >
> > > > > 520,253,610    APAC   --> 5,374,476 difference with the Kylin
> result
> > > > >
> > > > > 117,913,857    LATAM --> 99,372 difference with the Kylin result
> > > > >
> > > > > Based on that it seems that the limit is the int limit 2^31-1
> > > > > (2,147,483,647) and that my negative result comes from an overflow.
> > > > >
> > > > > In my cube I defined this measure as a bigint:
> > > > >
> > > > > Expression    Param Type    Param Value       Return Type
> > > > >
> > > > > SUM              column            AVAIL_COUNT    bigint
> > > > >
> > > > > The other thing that worries me a bit is that the other numbers are
> > not
> > > > > 100% accurate either (>100,000,000 difference!)
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > >
> > > *Bin Mahone | 马洪宾*
> > > Apache Kylin: http://kylin.io
> > > Github: https://github.com/binmahone
> > >
> >
>



-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Re: Negative number in SUM result and Kylin results not matching exactly Hive results

Reply via email to