[jira] [Commented] (KYLIN-1576) Support of new join type in the Cube Model - Temporal Join

Richard Calaba (JIRA) Tue, 14 Jun 2016 16:59:21 -0700

    [ 
https://issues.apache.org/jira/browse/KYLIN-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15330891#comment-15330891
 ]


Richard Calaba commented on KYLIN-1576:
---------------------------------------

One workaround - not generic - for this temporal join which is composed from:
  - equality join on entity ID (id1, ... id-N)
  - 2 non-equality joins on entity validity (date_from, date_to) 

is to define new fact-table which includes the original fact table and 
time-dependent atributes.

So the old join condition in the model:

  - FROM fact_table
                LEFT OUTER JOIN time_dependent_attrs 
                ON fact_table.id1 = time_dependent_attrs.id1 (AND  
fact_table.id-N = time_dependent_attrs.id-N)*
                AND fact_table.transaction_date <= time_dependent_attrs.date_to 
                AND fact_table.transaction_date >= 
time_dependent_attrs.date_from

 you can define new fact table this way to achieve same logic:

create table/view fact_table_new AS 
        SELECT fact_table.*, 
                timedep.attr1,  timedep.attr2, .... 
        FROM fact_table AS fact
             LEFT OUTER JOIN time_dependent_attrs AS timedep
             ON fact.id1 = timedep.id1 (AND fact.id-N = timedep.customer_id-N)* 
  
        WHERE fact.transaction_date BETWEEN timedep.date_from AND 
timedep.date_to;

The draw-back of this solution: You will make all time-dependent attributes (if 
needed to be used for grouping) as separate Normal dimensions - Kylin cannot 
utilize the optimized logic for Derrrived dimensions. So this solution is 
practical only for small amount of time-dependent attributes.

The 2nd workaround is (instead of creating new fact table) to create new 
dimension table (lookup table) where you can map the records from t to the keys 
of the original fact-table:

create table/view new_dim_table AS 
        SELECT fact_table.id-1, (fact_table.id-N)*,  
                timedep.*
        FROM fact_table AS fact
             LEFT OUTER JOIN time_dependent_attrs AS timedep
             ON fact.id1 = timedep.id1 (AND fact.id-N = timedep.id-N)*   
        WHERE fact.transaction_date BETWEEN timedep.date_from AND 
timedep.date_to;

And then you can use Kylin Modem to define:
 
     fact_table INNER JOIN new_dim_table    (you do not have to use LEFT OUTER 
JOIN here anymore)
        ON fact_table.id1 = new_dim_table.id1 (AND fact_table.id-N = 
new_dim_table.id-N)* 

This way you will get Dim table of same size as Fact tabe -> but you can still 
utilize Derrived dimensions benefits in Kylin.
 

> Support of new join type in the Cube Model - Temporal Join
> ----------------------------------------------------------
>
>                 Key: KYLIN-1576
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1576
>             Project: Kylin
>          Issue Type: New Feature
>          Components: General
>    Affects Versions: Future
>            Reporter: Richard Calaba
>            Priority: Blocker
>
> There is a notion of time-dependent master data in many business scenarios. 
> Typically modeled with granularity 1 day (datefrom, dateto fields of type 
> DATE defining validity time of one master data record). Occasionally you can 
> think of lower granularity so use of TIMESTAMP can be also seen as an valid 
> scenario). Example of such master data definition could be:
> Master Data / Dimension Table:
> =========================
> KEY: PRODUCT_ID, DATE_TO, 
> NON-KEY: DATE_FROM, PRODUCT_DESCRIPTION
> - assuming that PRODUCT_DESCRIPTION cannot have 2 values during one day it is 
> assumed that DATE_TO <= DATE_TO and also that there are no overlapping 
> intervals (DATE_FROM, DATE_TO) for all PRODUCT master data
> - the KEY is then intentionally defined as (PRODUCT_ID, DATE_TO) so the 
> statment SELECT * from PRODUCT WHERE ID = 'prod_key_1' AND DATE_TO >= 
> today/now and DATE_FROM <= today/now is efficient way to retrieve 'current' 
> PRODUCT master data (description). The today/now is also being named as 'key 
> date'.
> - now if I have transaction data (FACT table) of product sales, i.e:
> SALES_DATE, PRODUCT_ID, STORE_ID, ....
> I would like to show the Sold Products at Store at certain date and also show 
> the Description of the product at the date of product sale (assuming here 
> that there is product catalog which can be updated independently, but for 
> auditing purposes the original product description used during sale is needed 
> to be displayed/used).
> The SQL for the temporal join would be then:
> SELECT S.PRODUCT_ID, S.SALES_DATE, P.PRODUCT_DESCRIPTION 
> FROM SALES as S LEFT OUTER JOIN PRODUCT as P 
> ON S.PRODUCT_ID = P.PRODUCT_ID 
> AND S.SALES_DATE >= P.DATE_FROM AND 
> AND S.SALES_DATE <= P.DATE_TO 
> (also INNER TEMPORAL JOIN can be defined and be valid in some scenarios but 
> in this case it won't be the proper join - we need to show the product sales 
> even the description wasn't maintained in product master data)
> (some more details for temporal joins - see i.e. here - 
> http://scn.sap.com/community/hana-in-memory/blog/2014/09/28/using-temporal-join-to-fetch-the-result-set-within-the-time-interval
>  )
> This scenario can be supported by Kylin if following enhancement would be 
> done:
> 1) The Cube Model allowing to define special variant of LEFT OUTER and INNER 
> joins (suggesting name temporal (left outer/inner) join) which forces to 
> specify a 'key date' as a expression (column / constant / ...) from the FACT 
> table and 2 validity fields ('valid from' and 'valid to') fro the LOOKUP 
> table/ Those 2 validity fields are defining master data record validity 
> period. Supported types for those fields should be DATE, optionally TIMESTAMP 
> is also fine but rarely used in business scenarios. 
> Other option rather then defining new join type is to loosen the join 
> condition and allowing <= and >= operands to be used as part of the LOOKUP 
> join definition.
> 2) The Cube Definition then needs to know the extension of the join type in 
> the cube model and needs to force the additional fields (key-date, 
> valid-from, valid-to) be part of the whole cube structure.  Or alternatively 
> cube definition for derived dimensions can be extended to define a 
> "time-dependent derived lookup" similar way as described in the step 1) for 
> the suggested cube model join type extension. 
> 3) Very often the time-partition field of the cube which is being used for 
> incremental data loads to cubes will be the 'key-date'. BUT this shouldn't be 
> hard-coded this way as this is not true for every scenario. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (KYLIN-1576) Support of new join type in the Cube Model - Temporal Join

Reply via email to