[
https://issues.apache.org/jira/browse/KYLIN-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Billy(Yiming) Liu reassigned KYLIN-1576:
----------------------------------------
Assignee: Billy(Yiming) Liu (was: Yiming Liu)
> Support of new join type in the Cube Model - Temporal Join
> ----------------------------------------------------------
>
> Key: KYLIN-1576
> URL: https://issues.apache.org/jira/browse/KYLIN-1576
> Project: Kylin
> Issue Type: New Feature
> Components: General
> Affects Versions: Future
> Reporter: Richard Calaba
> Assignee: Billy(Yiming) Liu
> Priority: Blocker
>
> There is a notion of time-dependent master data in many business scenarios.
> Typically modeled with granularity 1 day (datefrom, dateto fields of type
> DATE defining validity time of one master data record). Occasionally you can
> think of lower granularity so use of TIMESTAMP can be also seen as an valid
> scenario). Example of such master data definition could be:
> Master Data / Dimension Table:
> =========================
> KEY: PRODUCT_ID, DATE_TO,
> NON-KEY: DATE_FROM, PRODUCT_DESCRIPTION
> - assuming that PRODUCT_DESCRIPTION cannot have 2 values during one day it is
> assumed that DATE_TO <= DATE_TO and also that there are no overlapping
> intervals (DATE_FROM, DATE_TO) for all PRODUCT master data
> - the KEY is then intentionally defined as (PRODUCT_ID, DATE_TO) so the
> statment SELECT * from PRODUCT WHERE ID = 'prod_key_1' AND DATE_TO >=
> today/now and DATE_FROM <= today/now is efficient way to retrieve 'current'
> PRODUCT master data (description). The today/now is also being named as 'key
> date'.
> - now if I have transaction data (FACT table) of product sales, i.e:
> SALES_DATE, PRODUCT_ID, STORE_ID, ....
> I would like to show the Sold Products at Store at certain date and also show
> the Description of the product at the date of product sale (assuming here
> that there is product catalog which can be updated independently, but for
> auditing purposes the original product description used during sale is needed
> to be displayed/used).
> The SQL for the temporal join would be then:
> SELECT S.PRODUCT_ID, S.SALES_DATE, P.PRODUCT_DESCRIPTION
> FROM SALES as S LEFT OUTER JOIN PRODUCT as P
> ON S.PRODUCT_ID = P.PRODUCT_ID
> AND S.SALES_DATE >= P.DATE_FROM AND
> AND S.SALES_DATE <= P.DATE_TO
> (also INNER TEMPORAL JOIN can be defined and be valid in some scenarios but
> in this case it won't be the proper join - we need to show the product sales
> even the description wasn't maintained in product master data)
> (some more details for temporal joins - see i.e. here -
> http://scn.sap.com/community/hana-in-memory/blog/2014/09/28/using-temporal-join-to-fetch-the-result-set-within-the-time-interval
> )
> This scenario can be supported by Kylin if following enhancement would be
> done:
> 1) The Cube Model allowing to define special variant of LEFT OUTER and INNER
> joins (suggesting name temporal (left outer/inner) join) which forces to
> specify a 'key date' as a expression (column / constant / ...) from the FACT
> table and 2 validity fields ('valid from' and 'valid to') fro the LOOKUP
> table/ Those 2 validity fields are defining master data record validity
> period. Supported types for those fields should be DATE, optionally TIMESTAMP
> is also fine but rarely used in business scenarios.
> Other option rather then defining new join type is to loosen the join
> condition and allowing <= and >= operands to be used as part of the LOOKUP
> join definition.
> 2) The Cube Definition then needs to know the extension of the join type in
> the cube model and needs to force the additional fields (key-date,
> valid-from, valid-to) be part of the whole cube structure. Or alternatively
> cube definition for derived dimensions can be extended to define a
> "time-dependent derived lookup" similar way as described in the step 1) for
> the suggested cube model join type extension.
> 3) Very often the time-partition field of the cube which is being used for
> incremental data loads to cubes will be the 'key-date'. BUT this shouldn't be
> hard-coded this way as this is not true for every scenario.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)