What's the data type in your hive table of that column? We recommend to use same as your measure definition.
For data quality, we have approach to ensure it will 100% match with source data, there's H2 comparison in our testing case. But yes, let's double check the test case about huge number like your case Is it possible to share your data and model to us if not sensitive? It will be better to using that to do the verify and testing here. Thanks. Best Regards! --------------------- Luke Han On Wed, Aug 5, 2015 at 10:57 PM, alex schufo <[email protected]> wrote: > I have a simple cube with a Fact Table and a Lookup Table. The Fact Table > includes some counter. The Lookup Table includes a region property which is > the highest level of the hierarchy and therefore can accumulate quite a lot > of data when summing the counter by region. > > I am doing a query like this: > > select sum(MY_COUNTER) as tot_count, REGION > > from FACT_TABLE > > join LOOKUP_TABLE > > on FACT_TABLE.FK = LOOKUP_TABLE.PK > > where date = '2015-01-01' > > group by REGION > > > I get the following result: > > -917,164,421 EMEA --> negative result! > > 777,795,808 AMER > > 514,879,134 APAC > > 117,814,485 LATAM > > I ran the exact same query in Hive and got the following result: > > 3,381,591,228 EMEA --> big number but smaller than the BIGINT limit > > 778,718,563 AMER --> 922,755 difference with the Kylin result > > 520,253,610 APAC --> 5,374,476 difference with the Kylin result > > 117,913,857 LATAM --> 99,372 difference with the Kylin result > > Based on that it seems that the limit is the int limit 2^31-1 > (2,147,483,647) and that my negative result comes from an overflow. > > In my cube I defined this measure as a bigint: > > Expression Param Type Param Value Return Type > > SUM column AVAIL_COUNT bigint > > The other thing that worries me a bit is that the other numbers are not > 100% accurate either (>100,000,000 difference!) >
