[jira] [Comment Edited] (KYLIN-1576) Support of new join type in the Cube Model - Temporal Join

Richard Calaba (JIRA) Wed, 15 Jun 2016 14:04:35 -0700

    [ 
https://issues.apache.org/jira/browse/KYLIN-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15330956#comment-15330956
 ]


Richard Calaba edited comment on KYLIN-1576 at 6/15/16 9:03 PM:
----------------------------------------------------------------

Interestingly - I just found out that Apache Drill supports this scenario - at 
least 2 yars ago they pacthed it to support this if at least one equi join 
condition is used: https://issues.apache.org/jira/browse/DRILL-485 

So questions:

 1) Why Hive cannot implement same - seems the argument in the docu doesn't 
hold anymore ...
 2) Should / Can Kyling Cube Build use Apache Drill while builiding the data 
cubes ??? 

Update in regards to the 2) - Can Kylin Cube Build use Apache Drill ... seems 
this is on eBay roadmap ... considering Slide 68 here: 
http://events.mapr.com/BayAreaApacheDrill and here 
https://www.slideshare.net/secret/lMvMrzy9mFeyBP/?utm_source=ion&utm_medium=Meetups&utm_campaign=ION_MKT_HUG_ApacheDrill_GA.
 Morover seems they might implement the idea of running Drill Query in case 
Kylin Query cannot be evaluated ... that's awesome ... and the same idea we are 
trying to implement at Fishbowl
  


was (Author: cal...@gmail.com):
Interestingly - I just found out that Apache Drill supports this scenario - at 
least 2 yars ago they pacthed it to support this if at least one equi join 
condition is used: https://issues.apache.org/jira/browse/DRILL-485 

So questions:

 1) Why Hive cannot implement same - seems the argument in the docu doesn't 
hold anymore ...
 2) Should / Can Kyling Cube Build use Apache Drill while builiding the data 
cubes ??? 

> Support of new join type in the Cube Model - Temporal Join
> ----------------------------------------------------------
>
>                 Key: KYLIN-1576
>                 URL: https://issues.apache.org/jira/browse/KYLIN-1576
>             Project: Kylin
>          Issue Type: New Feature
>          Components: General
>    Affects Versions: Future
>            Reporter: Richard Calaba
>            Priority: Blocker
>
> There is a notion of time-dependent master data in many business scenarios. 
> Typically modeled with granularity 1 day (datefrom, dateto fields of type 
> DATE defining validity time of one master data record). Occasionally you can 
> think of lower granularity so use of TIMESTAMP can be also seen as an valid 
> scenario). Example of such master data definition could be:
> Master Data / Dimension Table:
> =========================
> KEY: PRODUCT_ID, DATE_TO, 
> NON-KEY: DATE_FROM, PRODUCT_DESCRIPTION
> - assuming that PRODUCT_DESCRIPTION cannot have 2 values during one day it is 
> assumed that DATE_TO <= DATE_TO and also that there are no overlapping 
> intervals (DATE_FROM, DATE_TO) for all PRODUCT master data
> - the KEY is then intentionally defined as (PRODUCT_ID, DATE_TO) so the 
> statment SELECT * from PRODUCT WHERE ID = 'prod_key_1' AND DATE_TO >= 
> today/now and DATE_FROM <= today/now is efficient way to retrieve 'current' 
> PRODUCT master data (description). The today/now is also being named as 'key 
> date'.
> - now if I have transaction data (FACT table) of product sales, i.e:
> SALES_DATE, PRODUCT_ID, STORE_ID, ....
> I would like to show the Sold Products at Store at certain date and also show 
> the Description of the product at the date of product sale (assuming here 
> that there is product catalog which can be updated independently, but for 
> auditing purposes the original product description used during sale is needed 
> to be displayed/used).
> The SQL for the temporal join would be then:
> SELECT S.PRODUCT_ID, S.SALES_DATE, P.PRODUCT_DESCRIPTION 
> FROM SALES as S LEFT OUTER JOIN PRODUCT as P 
> ON S.PRODUCT_ID = P.PRODUCT_ID 
> AND S.SALES_DATE >= P.DATE_FROM AND 
> AND S.SALES_DATE <= P.DATE_TO 
> (also INNER TEMPORAL JOIN can be defined and be valid in some scenarios but 
> in this case it won't be the proper join - we need to show the product sales 
> even the description wasn't maintained in product master data)
> (some more details for temporal joins - see i.e. here - 
> http://scn.sap.com/community/hana-in-memory/blog/2014/09/28/using-temporal-join-to-fetch-the-result-set-within-the-time-interval
>  )
> This scenario can be supported by Kylin if following enhancement would be 
> done:
> 1) The Cube Model allowing to define special variant of LEFT OUTER and INNER 
> joins (suggesting name temporal (left outer/inner) join) which forces to 
> specify a 'key date' as a expression (column / constant / ...) from the FACT 
> table and 2 validity fields ('valid from' and 'valid to') fro the LOOKUP 
> table/ Those 2 validity fields are defining master data record validity 
> period. Supported types for those fields should be DATE, optionally TIMESTAMP 
> is also fine but rarely used in business scenarios. 
> Other option rather then defining new join type is to loosen the join 
> condition and allowing <= and >= operands to be used as part of the LOOKUP 
> join definition.
> 2) The Cube Definition then needs to know the extension of the join type in 
> the cube model and needs to force the additional fields (key-date, 
> valid-from, valid-to) be part of the whole cube structure.  Or alternatively 
> cube definition for derived dimensions can be extended to define a 
> "time-dependent derived lookup" similar way as described in the step 1) for 
> the suggested cube model join type extension. 
> 3) Very often the time-partition field of the cube which is being used for 
> incremental data loads to cubes will be the 'key-date'. BUT this shouldn't be 
> hard-coded this way as this is not true for every scenario. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (KYLIN-1576) Support of new join type in the Cube Model - Temporal Join

Reply via email to