[ 
https://issues.apache.org/jira/browse/KYLIN-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Billy(Yiming) Liu reassigned KYLIN-1576:
----------------------------------------

    Assignee: Billy(Yiming) Liu  (was: Yiming Liu)

> Support of new join type in the Cube Model - Temporal Join
> ----------------------------------------------------------
>
>                 Key: KYLIN-1576
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1576
>             Project: Kylin
>          Issue Type: New Feature
>          Components: General
>    Affects Versions: Future
>            Reporter: Richard Calaba
>            Assignee: Billy(Yiming) Liu
>            Priority: Blocker
>
> There is a notion of time-dependent master data in many business scenarios. 
> Typically modeled with granularity 1 day (datefrom, dateto fields of type 
> DATE defining validity time of one master data record). Occasionally you can 
> think of lower granularity so use of TIMESTAMP can be also seen as an valid 
> scenario). Example of such master data definition could be:
> Master Data / Dimension Table:
> =========================
> KEY: PRODUCT_ID, DATE_TO, 
> NON-KEY: DATE_FROM, PRODUCT_DESCRIPTION
> - assuming that PRODUCT_DESCRIPTION cannot have 2 values during one day it is 
> assumed that DATE_TO <= DATE_TO and also that there are no overlapping 
> intervals (DATE_FROM, DATE_TO) for all PRODUCT master data
> - the KEY is then intentionally defined as (PRODUCT_ID, DATE_TO) so the 
> statment SELECT * from PRODUCT WHERE ID = 'prod_key_1' AND DATE_TO >= 
> today/now and DATE_FROM <= today/now is efficient way to retrieve 'current' 
> PRODUCT master data (description). The today/now is also being named as 'key 
> date'.
> - now if I have transaction data (FACT table) of product sales, i.e:
> SALES_DATE, PRODUCT_ID, STORE_ID, ....
> I would like to show the Sold Products at Store at certain date and also show 
> the Description of the product at the date of product sale (assuming here 
> that there is product catalog which can be updated independently, but for 
> auditing purposes the original product description used during sale is needed 
> to be displayed/used).
> The SQL for the temporal join would be then:
> SELECT S.PRODUCT_ID, S.SALES_DATE, P.PRODUCT_DESCRIPTION 
> FROM SALES as S LEFT OUTER JOIN PRODUCT as P 
> ON S.PRODUCT_ID = P.PRODUCT_ID 
> AND S.SALES_DATE >= P.DATE_FROM AND 
> AND S.SALES_DATE <= P.DATE_TO 
> (also INNER TEMPORAL JOIN can be defined and be valid in some scenarios but 
> in this case it won't be the proper join - we need to show the product sales 
> even the description wasn't maintained in product master data)
> (some more details for temporal joins - see i.e. here - 
> http://scn.sap.com/community/hana-in-memory/blog/2014/09/28/using-temporal-join-to-fetch-the-result-set-within-the-time-interval
>  )
> This scenario can be supported by Kylin if following enhancement would be 
> done:
> 1) The Cube Model allowing to define special variant of LEFT OUTER and INNER 
> joins (suggesting name temporal (left outer/inner) join) which forces to 
> specify a 'key date' as a expression (column / constant / ...) from the FACT 
> table and 2 validity fields ('valid from' and 'valid to') fro the LOOKUP 
> table/ Those 2 validity fields are defining master data record validity 
> period. Supported types for those fields should be DATE, optionally TIMESTAMP 
> is also fine but rarely used in business scenarios. 
> Other option rather then defining new join type is to loosen the join 
> condition and allowing <= and >= operands to be used as part of the LOOKUP 
> join definition.
> 2) The Cube Definition then needs to know the extension of the join type in 
> the cube model and needs to force the additional fields (key-date, 
> valid-from, valid-to) be part of the whole cube structure.  Or alternatively 
> cube definition for derived dimensions can be extended to define a 
> "time-dependent derived lookup" similar way as described in the step 1) for 
> the suggested cube model join type extension. 
> 3) Very often the time-partition field of the cube which is being used for 
> incremental data loads to cubes will be the 'key-date'. BUT this shouldn't be 
> hard-coded this way as this is not true for every scenario. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to