I am using 0.7.2.

The Hive column is int, does that mean I should try to modify the existing
Hive table to move to Bigint as well to have the Kylin count as a Bigint?

I probably won't be allowed to share this kind of data outside my
organization unfortunately.

On Wed, Aug 5, 2015 at 4:22 PM, hongbin ma <[email protected]> wrote:

> except for what Luke asked, can you also share which version are you using?
> We'll look into this
>
> On Wed, Aug 5, 2015 at 11:08 PM, Luke Han <[email protected]> wrote:
>
> > What's the data type in your hive table of that column?
> > We recommend to use same as your measure definition.
> >
> > For data quality, we have approach to ensure it will 100%
> > match with source data, there's H2 comparison in our testing case.
> > But yes, let's double check the test case about huge number like your
> case
> >
> > Is it possible to share your data and model to us if not sensitive?
> > It will be better to using that to do the verify and testing here.
> >
> > Thanks.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Best Regards!
> > ---------------------
> >
> > Luke Han
> >
> > On Wed, Aug 5, 2015 at 10:57 PM, alex schufo <[email protected]>
> wrote:
> >
> > > I have a simple cube with a Fact Table and a Lookup Table. The Fact
> Table
> > > includes some counter. The Lookup Table includes a region property
> which
> > is
> > > the highest level of the hierarchy and therefore can accumulate quite a
> > lot
> > > of data when summing the counter by region.
> > >
> > > I am doing a query like this:
> > >
> > > select sum(MY_COUNTER) as tot_count, REGION
> > >
> > > from FACT_TABLE
> > >
> > > join LOOKUP_TABLE
> > >
> > > on FACT_TABLE.FK = LOOKUP_TABLE.PK
> > >
> > > where date = '2015-01-01'
> > >
> > > group by REGION
> > >
> > >
> > > I get the following result:
> > >
> > > -917,164,421 EMEA --> negative result!
> > >
> > > 777,795,808 AMER
> > >
> > > 514,879,134 APAC
> > >
> > > 117,814,485 LATAM
> > >
> > > I ran the exact same query in Hive and got the following result:
> > >
> > > 3,381,591,228  EMEA  --> big number but smaller than the BIGINT limit
> > >
> > > 778,718,563    AMER  --> 922,755 difference with the Kylin result
> > >
> > > 520,253,610    APAC   --> 5,374,476 difference with the Kylin result
> > >
> > > 117,913,857    LATAM --> 99,372 difference with the Kylin result
> > >
> > > Based on that it seems that the limit is the int limit 2^31-1
> > > (2,147,483,647) and that my negative result comes from an overflow.
> > >
> > > In my cube I defined this measure as a bigint:
> > >
> > > Expression    Param Type    Param Value       Return Type
> > >
> > > SUM              column            AVAIL_COUNT    bigint
> > >
> > > The other thing that worries me a bit is that the other numbers are not
> > > 100% accurate either (>100,000,000 difference!)
> > >
> >
>
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>

Reply via email to