The test table is in billions record and in the hundreds of GB. For this test query however I am restricting to only 1 day of data which makes it some millions records.
Not possible to export and share I'm afraid as it's sensitive information. On Thu, Aug 6, 2015 at 10:42 AM, Li Yang <[email protected]> wrote: > Interesting.. I'm thinking how to reproduce your problem then we can help.. > > How big is the test hive table? Possible to export and share with us? > > On Thu, Aug 6, 2015 at 5:23 PM, alex schufo <[email protected]> wrote: > > > This is a QA environment where I fully control the Hive tables and I > loaded > > this dataset manually and didn't make any change since. The cube itself > has > > been created yesterday so no I don't see any Hive update after the cube > > build. > > > > I didn't have the chance to modify the Hive table use BIGINT as well but > > will do when I have time. > > > > On Thu, Aug 6, 2015 at 10:17 AM, Li Yang <[email protected]> wrote: > > > > > Integer overflow is possible. I do remember some FIXME there. > > > > > > Incorrect SUM is quite impossible. Regression test reconciles SUM > result > > > with H2 database every day. Maybe the hive table is updated after cube > > > build? Try refresh cube to ensure it captures latest hive data. > > > > > > > > > > > > On Thu, Aug 6, 2015 at 9:23 AM, hongbin ma <[email protected]> > wrote: > > > > > > > I'll try to reproduce the issue in my own environment > > > > > > > > On Wed, Aug 5, 2015 at 11:51 PM, Luke Han <[email protected]> wrote: > > > > > > > > > That's fine, please change your hive schema first to make sure they > > are > > > > > same, > > > > > then try again. > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > Best Regards! > > > > > --------------------- > > > > > > > > > > Luke Han > > > > > > > > > > On Wed, Aug 5, 2015 at 11:29 PM, alex schufo <[email protected] > > > > > > wrote: > > > > > > > > > > > I am using 0.7.2. > > > > > > > > > > > > The Hive column is int, does that mean I should try to modify the > > > > > existing > > > > > > Hive table to move to Bigint as well to have the Kylin count as a > > > > Bigint? > > > > > > > > > > > > I probably won't be allowed to share this kind of data outside my > > > > > > organization unfortunately. > > > > > > > > > > > > On Wed, Aug 5, 2015 at 4:22 PM, hongbin ma <[email protected] > > > > > > wrote: > > > > > > > > > > > > > except for what Luke asked, can you also share which version > are > > > you > > > > > > using? > > > > > > > We'll look into this > > > > > > > > > > > > > > On Wed, Aug 5, 2015 at 11:08 PM, Luke Han <[email protected]> > > > wrote: > > > > > > > > > > > > > > > What's the data type in your hive table of that column? > > > > > > > > We recommend to use same as your measure definition. > > > > > > > > > > > > > > > > For data quality, we have approach to ensure it will 100% > > > > > > > > match with source data, there's H2 comparison in our testing > > > case. > > > > > > > > But yes, let's double check the test case about huge number > > like > > > > your > > > > > > > case > > > > > > > > > > > > > > > > Is it possible to share your data and model to us if not > > > sensitive? > > > > > > > > It will be better to using that to do the verify and testing > > > here. > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Best Regards! > > > > > > > > --------------------- > > > > > > > > > > > > > > > > Luke Han > > > > > > > > > > > > > > > > On Wed, Aug 5, 2015 at 10:57 PM, alex schufo < > > > [email protected] > > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > I have a simple cube with a Fact Table and a Lookup Table. > > The > > > > Fact > > > > > > > Table > > > > > > > > > includes some counter. The Lookup Table includes a region > > > > property > > > > > > > which > > > > > > > > is > > > > > > > > > the highest level of the hierarchy and therefore can > > accumulate > > > > > > quite a > > > > > > > > lot > > > > > > > > > of data when summing the counter by region. > > > > > > > > > > > > > > > > > > I am doing a query like this: > > > > > > > > > > > > > > > > > > select sum(MY_COUNTER) as tot_count, REGION > > > > > > > > > > > > > > > > > > from FACT_TABLE > > > > > > > > > > > > > > > > > > join LOOKUP_TABLE > > > > > > > > > > > > > > > > > > on FACT_TABLE.FK = LOOKUP_TABLE.PK > > > > > > > > > > > > > > > > > > where date = '2015-01-01' > > > > > > > > > > > > > > > > > > group by REGION > > > > > > > > > > > > > > > > > > > > > > > > > > > I get the following result: > > > > > > > > > > > > > > > > > > -917,164,421 EMEA --> negative result! > > > > > > > > > > > > > > > > > > 777,795,808 AMER > > > > > > > > > > > > > > > > > > 514,879,134 APAC > > > > > > > > > > > > > > > > > > 117,814,485 LATAM > > > > > > > > > > > > > > > > > > I ran the exact same query in Hive and got the following > > > result: > > > > > > > > > > > > > > > > > > 3,381,591,228 EMEA --> big number but smaller than the > > BIGINT > > > > > limit > > > > > > > > > > > > > > > > > > 778,718,563 AMER --> 922,755 difference with the Kylin > > > result > > > > > > > > > > > > > > > > > > 520,253,610 APAC --> 5,374,476 difference with the > Kylin > > > > > result > > > > > > > > > > > > > > > > > > 117,913,857 LATAM --> 99,372 difference with the Kylin > > > result > > > > > > > > > > > > > > > > > > Based on that it seems that the limit is the int limit > 2^31-1 > > > > > > > > > (2,147,483,647) and that my negative result comes from an > > > > overflow. > > > > > > > > > > > > > > > > > > In my cube I defined this measure as a bigint: > > > > > > > > > > > > > > > > > > Expression Param Type Param Value Return Type > > > > > > > > > > > > > > > > > > SUM column AVAIL_COUNT bigint > > > > > > > > > > > > > > > > > > The other thing that worries me a bit is that the other > > numbers > > > > are > > > > > > not > > > > > > > > > 100% accurate either (>100,000,000 difference!) > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > Regards, > > > > > > > > > > > > > > *Bin Mahone | 马洪宾* > > > > > > > Apache Kylin: http://kylin.io > > > > > > > Github: https://github.com/binmahone > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Regards, > > > > > > > > *Bin Mahone | 马洪宾* > > > > Apache Kylin: http://kylin.io > > > > Github: https://github.com/binmahone > > > > > > > > > >
