[ 
https://issues.apache.org/jira/browse/KYLIN-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15330573#comment-15330573
 ] 

Richard Calaba commented on KYLIN-1576:
---------------------------------------

Houston, we have a problem - according to 
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins 

"Only equality joins, outer joins, and left semi joins are supported in Hive. 
Hive does not support join conditions that are not equality conditions as it is 
very difficult to express such conditions as a map/reduce job. Also, more than 
two tables can be joined in Hive."

Any ideas ... ?

> Support of new join type in the Cube Model - Temporal Join
> ----------------------------------------------------------
>
>                 Key: KYLIN-1576
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1576
>             Project: Kylin
>          Issue Type: New Feature
>          Components: General
>    Affects Versions: Future
>            Reporter: Richard Calaba
>            Priority: Blocker
>
> There is a notion of time-dependent master data in many business scenarios. 
> Typically modeled with granularity 1 day (datefrom, dateto fields of type 
> DATE defining validity time of one master data record). Occasionally you can 
> think of lower granularity so use of TIMESTAMP can be also seen as an valid 
> scenario). Example of such master data definition could be:
> Master Data / Dimension Table:
> =========================
> KEY: PRODUCT_ID, DATE_TO, 
> NON-KEY: DATE_FROM, PRODUCT_DESCRIPTION
> - assuming that PRODUCT_DESCRIPTION cannot have 2 values during one day it is 
> assumed that DATE_TO <= DATE_TO and also that there are no overlapping 
> intervals (DATE_FROM, DATE_TO) for all PRODUCT master data
> - the KEY is then intentionally defined as (PRODUCT_ID, DATE_TO) so the 
> statment SELECT * from PRODUCT WHERE ID = 'prod_key_1' AND DATE_TO >= 
> today/now and DATE_FROM <= today/now is efficient way to retrieve 'current' 
> PRODUCT master data (description). The today/now is also being named as 'key 
> date'.
> - now if I have transaction data (FACT table) of product sales, i.e:
> SALES_DATE, PRODUCT_ID, STORE_ID, ....
> I would like to show the Sold Products at Store at certain date and also show 
> the Description of the product at the date of product sale (assuming here 
> that there is product catalog which can be updated independently, but for 
> auditing purposes the original product description used during sale is needed 
> to be displayed/used).
> The SQL for the temporal join would be then:
> SELECT S.PRODUCT_ID, S.SALES_DATE, P.PRODUCT_DESCRIPTION 
> FROM SALES as S LEFT OUTER JOIN PRODUCT as P 
> ON S.PRODUCT_ID = P.PRODUCT_ID 
> AND S.SALES_DATE >= P.DATE_FROM AND 
> AND S.SALES_DATE <= P.DATE_TO 
> (also INNER TEMPORAL JOIN can be defined and be valid in some scenarios but 
> in this case it won't be the proper join - we need to show the product sales 
> even the description wasn't maintained in product master data)
> (some more details for temporal joins - see i.e. here - 
> http://scn.sap.com/community/hana-in-memory/blog/2014/09/28/using-temporal-join-to-fetch-the-result-set-within-the-time-interval
>  )
> This scenario can be supported by Kylin if following enhancement would be 
> done:
> 1) The Cube Model allowing to define special variant of LEFT OUTER and INNER 
> joins (suggesting name temporal (left outer/inner) join) which forces to 
> specify a 'key date' as a expression (column / constant / ...) from the FACT 
> table and 2 validity fields ('valid from' and 'valid to') fro the LOOKUP 
> table/ Those 2 validity fields are defining master data record validity 
> period. Supported types for those fields should be DATE, optionally TIMESTAMP 
> is also fine but rarely used in business scenarios. 
> Other option rather then defining new join type is to loosen the join 
> condition and allowing <= and >= operands to be used as part of the LOOKUP 
> join definition.
> 2) The Cube Definition then needs to know the extension of the join type in 
> the cube model and needs to force the additional fields (key-date, 
> valid-from, valid-to) be part of the whole cube structure.  Or alternatively 
> cube definition for derived dimensions can be extended to define a 
> "time-dependent derived lookup" similar way as described in the step 1) for 
> the suggested cube model join type extension. 
> 3) Very often the time-partition field of the cube which is being used for 
> incremental data loads to cubes will be the 'key-date'. BUT this shouldn't be 
> hard-coded this way as this is not true for every scenario. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to