This is a QA environment where I fully control the Hive tables and I loaded this dataset manually and didn't make any change since. The cube itself has been created yesterday so no I don't see any Hive update after the cube build.
I didn't have the chance to modify the Hive table use BIGINT as well but will do when I have time. On Thu, Aug 6, 2015 at 10:17 AM, Li Yang <[email protected]> wrote: > Integer overflow is possible. I do remember some FIXME there. > > Incorrect SUM is quite impossible. Regression test reconciles SUM result > with H2 database every day. Maybe the hive table is updated after cube > build? Try refresh cube to ensure it captures latest hive data. > > > > On Thu, Aug 6, 2015 at 9:23 AM, hongbin ma <[email protected]> wrote: > > > I'll try to reproduce the issue in my own environment > > > > On Wed, Aug 5, 2015 at 11:51 PM, Luke Han <[email protected]> wrote: > > > > > That's fine, please change your hive schema first to make sure they are > > > same, > > > then try again. > > > > > > Thanks. > > > > > > > > > > > > Best Regards! > > > --------------------- > > > > > > Luke Han > > > > > > On Wed, Aug 5, 2015 at 11:29 PM, alex schufo <[email protected]> > > wrote: > > > > > > > I am using 0.7.2. > > > > > > > > The Hive column is int, does that mean I should try to modify the > > > existing > > > > Hive table to move to Bigint as well to have the Kylin count as a > > Bigint? > > > > > > > > I probably won't be allowed to share this kind of data outside my > > > > organization unfortunately. > > > > > > > > On Wed, Aug 5, 2015 at 4:22 PM, hongbin ma <[email protected]> > > wrote: > > > > > > > > > except for what Luke asked, can you also share which version are > you > > > > using? > > > > > We'll look into this > > > > > > > > > > On Wed, Aug 5, 2015 at 11:08 PM, Luke Han <[email protected]> > wrote: > > > > > > > > > > > What's the data type in your hive table of that column? > > > > > > We recommend to use same as your measure definition. > > > > > > > > > > > > For data quality, we have approach to ensure it will 100% > > > > > > match with source data, there's H2 comparison in our testing > case. > > > > > > But yes, let's double check the test case about huge number like > > your > > > > > case > > > > > > > > > > > > Is it possible to share your data and model to us if not > sensitive? > > > > > > It will be better to using that to do the verify and testing > here. > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best Regards! > > > > > > --------------------- > > > > > > > > > > > > Luke Han > > > > > > > > > > > > On Wed, Aug 5, 2015 at 10:57 PM, alex schufo < > [email protected] > > > > > > > > wrote: > > > > > > > > > > > > > I have a simple cube with a Fact Table and a Lookup Table. The > > Fact > > > > > Table > > > > > > > includes some counter. The Lookup Table includes a region > > property > > > > > which > > > > > > is > > > > > > > the highest level of the hierarchy and therefore can accumulate > > > > quite a > > > > > > lot > > > > > > > of data when summing the counter by region. > > > > > > > > > > > > > > I am doing a query like this: > > > > > > > > > > > > > > select sum(MY_COUNTER) as tot_count, REGION > > > > > > > > > > > > > > from FACT_TABLE > > > > > > > > > > > > > > join LOOKUP_TABLE > > > > > > > > > > > > > > on FACT_TABLE.FK = LOOKUP_TABLE.PK > > > > > > > > > > > > > > where date = '2015-01-01' > > > > > > > > > > > > > > group by REGION > > > > > > > > > > > > > > > > > > > > > I get the following result: > > > > > > > > > > > > > > -917,164,421 EMEA --> negative result! > > > > > > > > > > > > > > 777,795,808 AMER > > > > > > > > > > > > > > 514,879,134 APAC > > > > > > > > > > > > > > 117,814,485 LATAM > > > > > > > > > > > > > > I ran the exact same query in Hive and got the following > result: > > > > > > > > > > > > > > 3,381,591,228 EMEA --> big number but smaller than the BIGINT > > > limit > > > > > > > > > > > > > > 778,718,563 AMER --> 922,755 difference with the Kylin > result > > > > > > > > > > > > > > 520,253,610 APAC --> 5,374,476 difference with the Kylin > > > result > > > > > > > > > > > > > > 117,913,857 LATAM --> 99,372 difference with the Kylin > result > > > > > > > > > > > > > > Based on that it seems that the limit is the int limit 2^31-1 > > > > > > > (2,147,483,647) and that my negative result comes from an > > overflow. > > > > > > > > > > > > > > In my cube I defined this measure as a bigint: > > > > > > > > > > > > > > Expression Param Type Param Value Return Type > > > > > > > > > > > > > > SUM column AVAIL_COUNT bigint > > > > > > > > > > > > > > The other thing that worries me a bit is that the other numbers > > are > > > > not > > > > > > > 100% accurate either (>100,000,000 difference!) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Regards, > > > > > > > > > > *Bin Mahone | 马洪宾* > > > > > Apache Kylin: http://kylin.io > > > > > Github: https://github.com/binmahone > > > > > > > > > > > > > > > > > > > > -- > > Regards, > > > > *Bin Mahone | 马洪宾* > > Apache Kylin: http://kylin.io > > Github: https://github.com/binmahone > > >
