Richard Calaba created KYLIN-1576:
-------------------------------------
Summary: Support of new join type in the Cube Modem - Temporal Join
Key: KYLIN-1576
URL: https://issues.apache.org/jira/browse/KYLIN-1576
Project: Kylin
Issue Type: New Feature
Components: General
Affects Versions: Future
Reporter: Richard Calaba
There is a notion of time-dependent master data in many business scenarios.
Typically modeled with granularity 1 day (datefrom, dateto fields of type DATE
defining validity time of one master data record). Occasionally you can think
of lower granularity so use of TIMESTAMP can be also seen as an valid
scenario). Example of such master data definition could be:
Master Data / Dimension Table:
=========================
KEY: PRODUCT_ID, DATE_TO,
NON-KEY: DATE_FROM, PRODUCT_DESCRIPTION
- assuming that PRODUCT_DESCRIPTION cannot have 2 values during one day it is
assumed that DATE_TO <= DATE_TO and also that there are no overlapping
intervals (DATE_FROM, DATE_TO) for all PRODUCT master data
- the KEY is then intentionally defined as (PRODUCT_ID, DATE_TO) so the
statment SELECT * from PRODUCT WHERE ID = 'prod_key_1' AND DATE_TO >= today/now
and DATE_FROM <= today/now is efficient way to retrieve 'current' PRODUCT
master data (description). The today/now is also being named as 'key date'.
- now if I have transaction data (FACT table) of product sales, i.e:
SALES_DATE, PRODUCT_ID, STORE_ID, ....
I would like to show the Sold Products at Store at certain date and also show
the Description of the product at the date of product sale (assuming here that
there is product catalog which can be updated independently, but for auditing
purposes the original product description used during sale is needed to be
displayed/used).
The SQL for the temporal join would be then:
SELECT S.PRODUCT_ID, S.SALES_DATE, P.PRODUCT_DESCRIPTION
FROM SALES as S LEFT OUTER JOIN PRODUCT as P
ON S.PRODUCT_ID = P.PRODUCT_ID
AND S.SALES_DATE >= P.DATE_FROM AND
AND S.SALES_DATE <= P.DATE_TO
(also INNER TEMPORAL JOIN can be defined and be valid in some scenarios but in
this case it won't be the proper join - we need to show the product sales even
the description wasn't maintained in product master data)
(some more details for temporal joins - see i.e. here -
http://scn.sap.com/community/hana-in-memory/blog/2014/09/28/using-temporal-join-to-fetch-the-result-set-within-the-time-interval
)
This scenario can be supported by Kylin if following enhancement would be done:
1) The Cube Model alloiwng to define special variant of LEFT OUTER and INNER
joins (suggesting name temporal (left outer/inner) join) which forces to
specify a 'key date' as a expression (column / constant / ...) from the FACT
table and 2 validity fields ('valid from' and 'valid to') fro the LOOKUP table/
Those 2 validity fields are defining master data record validity period.
Supported types for those fields should be DATE, optionally TIMESTAMP is also
fine but rarely used in business scenarios.
Other option rather then defining new join type is to loosen the join condition
and allowing <= and >= operands to be used as part of the LOOKUP join
definition.
2) The Cube Definition then needs to know the extension of the join type in the
cube model and needs to force the additional fields (key-date, valid-from,
valid-to) be part of the whole cube structure.
3) Very often the time-partition field of the cube which is being used for
incremental data loads to cubes will be the 'key-date'. BUT this shouldn't be
hard-coded this way as this is not true for every scenario.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)