I'll try to reproduce the issue in my own environment On Wed, Aug 5, 2015 at 11:51 PM, Luke Han <[email protected]> wrote:
> That's fine, please change your hive schema first to make sure they are > same, > then try again. > > Thanks. > > > > Best Regards! > --------------------- > > Luke Han > > On Wed, Aug 5, 2015 at 11:29 PM, alex schufo <[email protected]> wrote: > > > I am using 0.7.2. > > > > The Hive column is int, does that mean I should try to modify the > existing > > Hive table to move to Bigint as well to have the Kylin count as a Bigint? > > > > I probably won't be allowed to share this kind of data outside my > > organization unfortunately. > > > > On Wed, Aug 5, 2015 at 4:22 PM, hongbin ma <[email protected]> wrote: > > > > > except for what Luke asked, can you also share which version are you > > using? > > > We'll look into this > > > > > > On Wed, Aug 5, 2015 at 11:08 PM, Luke Han <[email protected]> wrote: > > > > > > > What's the data type in your hive table of that column? > > > > We recommend to use same as your measure definition. > > > > > > > > For data quality, we have approach to ensure it will 100% > > > > match with source data, there's H2 comparison in our testing case. > > > > But yes, let's double check the test case about huge number like your > > > case > > > > > > > > Is it possible to share your data and model to us if not sensitive? > > > > It will be better to using that to do the verify and testing here. > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best Regards! > > > > --------------------- > > > > > > > > Luke Han > > > > > > > > On Wed, Aug 5, 2015 at 10:57 PM, alex schufo <[email protected]> > > > wrote: > > > > > > > > > I have a simple cube with a Fact Table and a Lookup Table. The Fact > > > Table > > > > > includes some counter. The Lookup Table includes a region property > > > which > > > > is > > > > > the highest level of the hierarchy and therefore can accumulate > > quite a > > > > lot > > > > > of data when summing the counter by region. > > > > > > > > > > I am doing a query like this: > > > > > > > > > > select sum(MY_COUNTER) as tot_count, REGION > > > > > > > > > > from FACT_TABLE > > > > > > > > > > join LOOKUP_TABLE > > > > > > > > > > on FACT_TABLE.FK = LOOKUP_TABLE.PK > > > > > > > > > > where date = '2015-01-01' > > > > > > > > > > group by REGION > > > > > > > > > > > > > > > I get the following result: > > > > > > > > > > -917,164,421 EMEA --> negative result! > > > > > > > > > > 777,795,808 AMER > > > > > > > > > > 514,879,134 APAC > > > > > > > > > > 117,814,485 LATAM > > > > > > > > > > I ran the exact same query in Hive and got the following result: > > > > > > > > > > 3,381,591,228 EMEA --> big number but smaller than the BIGINT > limit > > > > > > > > > > 778,718,563 AMER --> 922,755 difference with the Kylin result > > > > > > > > > > 520,253,610 APAC --> 5,374,476 difference with the Kylin > result > > > > > > > > > > 117,913,857 LATAM --> 99,372 difference with the Kylin result > > > > > > > > > > Based on that it seems that the limit is the int limit 2^31-1 > > > > > (2,147,483,647) and that my negative result comes from an overflow. > > > > > > > > > > In my cube I defined this measure as a bigint: > > > > > > > > > > Expression Param Type Param Value Return Type > > > > > > > > > > SUM column AVAIL_COUNT bigint > > > > > > > > > > The other thing that worries me a bit is that the other numbers are > > not > > > > > 100% accurate either (>100,000,000 difference!) > > > > > > > > > > > > > > > > > > > > > -- > > > Regards, > > > > > > *Bin Mahone | 马洪宾* > > > Apache Kylin: http://kylin.io > > > Github: https://github.com/binmahone > > > > > > -- Regards, *Bin Mahone | 马洪宾* Apache Kylin: http://kylin.io Github: https://github.com/binmahone
