Integer overflow is possible. I do remember some FIXME there. Incorrect SUM is quite impossible. Regression test reconciles SUM result with H2 database every day. Maybe the hive table is updated after cube build? Try refresh cube to ensure it captures latest hive data.
On Thu, Aug 6, 2015 at 9:23 AM, hongbin ma <[email protected]> wrote: > I'll try to reproduce the issue in my own environment > > On Wed, Aug 5, 2015 at 11:51 PM, Luke Han <[email protected]> wrote: > > > That's fine, please change your hive schema first to make sure they are > > same, > > then try again. > > > > Thanks. > > > > > > > > Best Regards! > > --------------------- > > > > Luke Han > > > > On Wed, Aug 5, 2015 at 11:29 PM, alex schufo <[email protected]> > wrote: > > > > > I am using 0.7.2. > > > > > > The Hive column is int, does that mean I should try to modify the > > existing > > > Hive table to move to Bigint as well to have the Kylin count as a > Bigint? > > > > > > I probably won't be allowed to share this kind of data outside my > > > organization unfortunately. > > > > > > On Wed, Aug 5, 2015 at 4:22 PM, hongbin ma <[email protected]> > wrote: > > > > > > > except for what Luke asked, can you also share which version are you > > > using? > > > > We'll look into this > > > > > > > > On Wed, Aug 5, 2015 at 11:08 PM, Luke Han <[email protected]> wrote: > > > > > > > > > What's the data type in your hive table of that column? > > > > > We recommend to use same as your measure definition. > > > > > > > > > > For data quality, we have approach to ensure it will 100% > > > > > match with source data, there's H2 comparison in our testing case. > > > > > But yes, let's double check the test case about huge number like > your > > > > case > > > > > > > > > > Is it possible to share your data and model to us if not sensitive? > > > > > It will be better to using that to do the verify and testing here. > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best Regards! > > > > > --------------------- > > > > > > > > > > Luke Han > > > > > > > > > > On Wed, Aug 5, 2015 at 10:57 PM, alex schufo <[email protected] > > > > > > wrote: > > > > > > > > > > > I have a simple cube with a Fact Table and a Lookup Table. The > Fact > > > > Table > > > > > > includes some counter. The Lookup Table includes a region > property > > > > which > > > > > is > > > > > > the highest level of the hierarchy and therefore can accumulate > > > quite a > > > > > lot > > > > > > of data when summing the counter by region. > > > > > > > > > > > > I am doing a query like this: > > > > > > > > > > > > select sum(MY_COUNTER) as tot_count, REGION > > > > > > > > > > > > from FACT_TABLE > > > > > > > > > > > > join LOOKUP_TABLE > > > > > > > > > > > > on FACT_TABLE.FK = LOOKUP_TABLE.PK > > > > > > > > > > > > where date = '2015-01-01' > > > > > > > > > > > > group by REGION > > > > > > > > > > > > > > > > > > I get the following result: > > > > > > > > > > > > -917,164,421 EMEA --> negative result! > > > > > > > > > > > > 777,795,808 AMER > > > > > > > > > > > > 514,879,134 APAC > > > > > > > > > > > > 117,814,485 LATAM > > > > > > > > > > > > I ran the exact same query in Hive and got the following result: > > > > > > > > > > > > 3,381,591,228 EMEA --> big number but smaller than the BIGINT > > limit > > > > > > > > > > > > 778,718,563 AMER --> 922,755 difference with the Kylin result > > > > > > > > > > > > 520,253,610 APAC --> 5,374,476 difference with the Kylin > > result > > > > > > > > > > > > 117,913,857 LATAM --> 99,372 difference with the Kylin result > > > > > > > > > > > > Based on that it seems that the limit is the int limit 2^31-1 > > > > > > (2,147,483,647) and that my negative result comes from an > overflow. > > > > > > > > > > > > In my cube I defined this measure as a bigint: > > > > > > > > > > > > Expression Param Type Param Value Return Type > > > > > > > > > > > > SUM column AVAIL_COUNT bigint > > > > > > > > > > > > The other thing that worries me a bit is that the other numbers > are > > > not > > > > > > 100% accurate either (>100,000,000 difference!) > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Regards, > > > > > > > > *Bin Mahone | 马洪宾* > > > > Apache Kylin: http://kylin.io > > > > Github: https://github.com/binmahone > > > > > > > > > > > > > -- > Regards, > > *Bin Mahone | 马洪宾* > Apache Kylin: http://kylin.io > Github: https://github.com/binmahone >
