The test table is in billions record and in the hundreds of GB. For this
test query however I am restricting to only 1 day of data which makes it
some millions records.

Not possible to export and share I'm afraid as it's sensitive information.

On Thu, Aug 6, 2015 at 10:42 AM, Li Yang <[email protected]> wrote:

> Interesting.. I'm thinking how to reproduce your problem then we can help..
>
> How big is the test hive table? Possible to export and share with us?
>
> On Thu, Aug 6, 2015 at 5:23 PM, alex schufo <[email protected]> wrote:
>
> > This is a QA environment where I fully control the Hive tables and I
> loaded
> > this dataset manually and didn't make any change since. The cube itself
> has
> > been created yesterday so no I don't see any Hive update after the cube
> > build.
> >
> > I didn't have the chance to modify the Hive table use BIGINT as well but
> > will do when I have time.
> >
> > On Thu, Aug 6, 2015 at 10:17 AM, Li Yang <[email protected]> wrote:
> >
> > > Integer overflow is possible. I do remember some FIXME there.
> > >
> > > Incorrect SUM is quite impossible. Regression test reconciles SUM
> result
> > > with H2 database every day. Maybe the hive table is updated after cube
> > > build? Try refresh cube to ensure it captures latest hive data.
> > >
> > >
> > >
> > > On Thu, Aug 6, 2015 at 9:23 AM, hongbin ma <[email protected]>
> wrote:
> > >
> > > > I'll try to reproduce the issue in my own environment
> > > >
> > > > On Wed, Aug 5, 2015 at 11:51 PM, Luke Han <[email protected]> wrote:
> > > >
> > > > > That's fine, please change your hive schema first to make sure they
> > are
> > > > > same,
> > > > > then try again.
> > > > >
> > > > > Thanks.
> > > > >
> > > > >
> > > > >
> > > > > Best Regards!
> > > > > ---------------------
> > > > >
> > > > > Luke Han
> > > > >
> > > > > On Wed, Aug 5, 2015 at 11:29 PM, alex schufo <[email protected]
> >
> > > > wrote:
> > > > >
> > > > > > I am using 0.7.2.
> > > > > >
> > > > > > The Hive column is int, does that mean I should try to modify the
> > > > > existing
> > > > > > Hive table to move to Bigint as well to have the Kylin count as a
> > > > Bigint?
> > > > > >
> > > > > > I probably won't be allowed to share this kind of data outside my
> > > > > > organization unfortunately.
> > > > > >
> > > > > > On Wed, Aug 5, 2015 at 4:22 PM, hongbin ma <[email protected]
> >
> > > > wrote:
> > > > > >
> > > > > > > except for what Luke asked, can you also share which version
> are
> > > you
> > > > > > using?
> > > > > > > We'll look into this
> > > > > > >
> > > > > > > On Wed, Aug 5, 2015 at 11:08 PM, Luke Han <[email protected]>
> > > wrote:
> > > > > > >
> > > > > > > > What's the data type in your hive table of that column?
> > > > > > > > We recommend to use same as your measure definition.
> > > > > > > >
> > > > > > > > For data quality, we have approach to ensure it will 100%
> > > > > > > > match with source data, there's H2 comparison in our testing
> > > case.
> > > > > > > > But yes, let's double check the test case about huge number
> > like
> > > > your
> > > > > > > case
> > > > > > > >
> > > > > > > > Is it possible to share your data and model to us if not
> > > sensitive?
> > > > > > > > It will be better to using that to do the verify and testing
> > > here.
> > > > > > > >
> > > > > > > > Thanks.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Best Regards!
> > > > > > > > ---------------------
> > > > > > > >
> > > > > > > > Luke Han
> > > > > > > >
> > > > > > > > On Wed, Aug 5, 2015 at 10:57 PM, alex schufo <
> > > [email protected]
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > I have a simple cube with a Fact Table and a Lookup Table.
> > The
> > > > Fact
> > > > > > > Table
> > > > > > > > > includes some counter. The Lookup Table includes a region
> > > > property
> > > > > > > which
> > > > > > > > is
> > > > > > > > > the highest level of the hierarchy and therefore can
> > accumulate
> > > > > > quite a
> > > > > > > > lot
> > > > > > > > > of data when summing the counter by region.
> > > > > > > > >
> > > > > > > > > I am doing a query like this:
> > > > > > > > >
> > > > > > > > > select sum(MY_COUNTER) as tot_count, REGION
> > > > > > > > >
> > > > > > > > > from FACT_TABLE
> > > > > > > > >
> > > > > > > > > join LOOKUP_TABLE
> > > > > > > > >
> > > > > > > > > on FACT_TABLE.FK = LOOKUP_TABLE.PK
> > > > > > > > >
> > > > > > > > > where date = '2015-01-01'
> > > > > > > > >
> > > > > > > > > group by REGION
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > I get the following result:
> > > > > > > > >
> > > > > > > > > -917,164,421 EMEA --> negative result!
> > > > > > > > >
> > > > > > > > > 777,795,808 AMER
> > > > > > > > >
> > > > > > > > > 514,879,134 APAC
> > > > > > > > >
> > > > > > > > > 117,814,485 LATAM
> > > > > > > > >
> > > > > > > > > I ran the exact same query in Hive and got the following
> > > result:
> > > > > > > > >
> > > > > > > > > 3,381,591,228  EMEA  --> big number but smaller than the
> > BIGINT
> > > > > limit
> > > > > > > > >
> > > > > > > > > 778,718,563    AMER  --> 922,755 difference with the Kylin
> > > result
> > > > > > > > >
> > > > > > > > > 520,253,610    APAC   --> 5,374,476 difference with the
> Kylin
> > > > > result
> > > > > > > > >
> > > > > > > > > 117,913,857    LATAM --> 99,372 difference with the Kylin
> > > result
> > > > > > > > >
> > > > > > > > > Based on that it seems that the limit is the int limit
> 2^31-1
> > > > > > > > > (2,147,483,647) and that my negative result comes from an
> > > > overflow.
> > > > > > > > >
> > > > > > > > > In my cube I defined this measure as a bigint:
> > > > > > > > >
> > > > > > > > > Expression    Param Type    Param Value       Return Type
> > > > > > > > >
> > > > > > > > > SUM              column            AVAIL_COUNT    bigint
> > > > > > > > >
> > > > > > > > > The other thing that worries me a bit is that the other
> > numbers
> > > > are
> > > > > > not
> > > > > > > > > 100% accurate either (>100,000,000 difference!)
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Regards,
> > > > > > >
> > > > > > > *Bin Mahone | 马洪宾*
> > > > > > > Apache Kylin: http://kylin.io
> > > > > > > Github: https://github.com/binmahone
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Regards,
> > > >
> > > > *Bin Mahone | 马洪宾*
> > > > Apache Kylin: http://kylin.io
> > > > Github: https://github.com/binmahone
> > > >
> > >
> >
>

Reply via email to