Re: Incremental builds assumptions and clarifications

Li Yang Thu, 24 Dec 2015 18:06:05 -0800

Em.. don't think Luke has all the questions fully answered. My additions.

>    Is there a document explaining the assumptions for incremental builds.
The only assumption (or requirement) is that there is date or timestamp
column on the fact table that distinguishes the old from the new.


>    Do we allow 'updates' on a facts ?
> 1) Because of some typo the quantity came in as 100 instead of 10. What is
> the suggested approach to handle this.
So you want to refresh a built piece of data. And yes, that's doable. Kylin
cut cube into segments by time period. You can refresh (or rebuild) a
segment without impacting the rests.

> 2) Lets say the the value for dimension 1 was d1 in the facttable. Now it
> got updated to d2 for the same dimension. How does it 'deduct' from the
> aggregation for d1 for all cuboids and 'accumulate' for d2 in all cuboids.
>
>    How to support Slowly Changing Dimensions (SCD). Support for type 2 and
> type 3.
The design is Kylin remembers data at the point it's built. So you may
build a daily segment on T day with category set C in lookup table; then on
T+1 day, the category lookup table is updated into C~, and with that build
a T+1 daily segment. Now if you query the cube, it will report categories
including both C and C~. More precisely Kylin will return C for T day
transactions and C~ for T+1 transactions.

If what you want is to reflect C~ in historic data, then earlier segments
have to be rebuild.

On Thu, Dec 24, 2015 at 10:59 PM, Luke Han <luke...@gmail.com> wrote:

> Hi Abhilash,
>     Please refer to below comments inline.
>
>     Thanks.
>
>
> Best Regards!
> ---------------------
>
> Luke Han
>
> On Thu, Dec 10, 2015 at 2:28 PM, Abhilash L L <abhil...@infoworks.io>
> wrote:
>
> > Hello,
> >
> >    Is there a document explaining the assumptions for incremental builds.
> > *Luke: I'm afraid there's no such doc yet. what's exactly "assumption"
> you
> > are looking for, to know the code level implementation or how to
> optimize?*
>
>
>
> >
> >    Is it purely additive ? Lets say category id is one my row key
> > components. I had 10 products on category id 20. Now I got a new product
> > for same category would it add up. Would distinct count also be fine ?
> >
> *      Luke:  Kylin performs very well for such case, it will add up to 21,
> also for distinct count, but the result of distinct count is
> approximately.*
>
> >
> >    Do we allow 'updates' on a facts ?
> > 1) Because of some typo the quantity came in as 100 instead of 10. What
> is
> > the suggested approach to handle this.
> >
>        Luke: Do you mean data model changes? Then you have to disable that
> cube, purge data and refine it, the rebuild it.
>
> 2) Lets say the the value for dimension 1 was d1 in the facttable. Now it
> > got updated to d2 for the same dimension. How does it 'deduct' from the
> > aggregation for d1 for all cuboids and 'accumulate' for d2 in all
> cuboids.
> >
> >    How to support Slowly Changing Dimensions (SCD). Support for type 2
> and
> > type 3.
> >
> *      Luke: Kylin does not support SCD very well yet.*
>
> >
> >    How to support deletes in fact / dimension ?
> >
> *      Luke: delete in fact table is fine, but in dimension should be
> careful, properly it will require rebuild.*
>
> >
> >
> >    If theres a document explaining already, it would help us and a lot of
> > people.
> >
> >
> > Regards,
> > Abhilash
> >
>

Re: Incremental builds assumptions and clarifications

Reply via email to