[ https://issues.apache.org/jira/browse/HIVE-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13593316#comment-13593316 ]
Amareshwari Sriramadasu commented on HIVE-4115: ----------------------------------------------- Illustrating above model with an example : * Define a SALES_CUBE cube with measures : "Sales, Discount" and Dimensions: "CustomerID, Location, Transaction-time" * Dimensions: ** CustomerID is a simple dimension which refers to the customer table on column ID. CustomerTable is having the schema : "ID, Age, Gender" ** Location is hierarchical dimension with the hierarchy : "Zipcode, CityID, StateID, CountryID, RegionID" *** Zipcode refers to ZipTable on column code. ZipTable schema : "code, street-name, cityID, stateID" *** CityID refers to cityTable on column ID. CityTable schema : "ID, name, stateID" *** stateID refers to stateTable on column ID. StateTable schema : "ID, name, capital, countryID" *** countryID refers to counteryTable on column ID. CounterTable : "ID, name, capital, Region" *** Region is an inline dimension with values "'APAC', 'EMEA', 'USA'" ** Transaction-time is simple dimension with timestamp field. * Facts :Sales_cube can have the following fact tables : ## RawFact with columns "Sales, Discount, CustomerId, ZipCode, Transaction-time" ## CountryFact with columns "Sales, Discount, CountryID" Physical storage tables : ------------------------------------ In the example described above say that RawFact is rolled hourly in Cluster c1, is rolled daily and monthly on Cluster C2; CountryFact is rolled daily, monthly, quarterly and yearly on Cluster C2; Also, Customer table is available in HBase cluster H1; All the location tables are available in HDFS cluster C2. The physical tables would be : * C1_Rawfact_hourly - schema : "Sales, Discount, CustomerId, ZipCode, Transaction-time" Partitioned by dt and state. * C2_Rawfact_daily - schema : "Sales, Discount, CustomerId, ZipCode, Transaction-time" Partitioned by dt and state. * C2_Rawfact_monthly - schema : "Sales, Discount, CustomerId, ZipCode, Transaction-time" Partitioned by dt and state. * C2_CountryFact_daily - Schema : "Sales, Discount, CountryID" Partitioned by dt * C2_CountryFact_monthly - Schema : "Sales, Discount, CountryID" Partitioned by dt * C2_CountryFact_quarterly - Schema : "Sales, Discount, CountryID" Partitioned by dt * C2_CountryFact_yearly - Schema : "Sales, Discount, CountryID" Partitioned by dt * H1_CustomerTable - schema : "ID, Age, Gender" * C2_ZipTable - schema : "code, street-name, cityID, stateID" * C2_CityTable - schema : "ID, name, stateID" * C2_StateTable -schema : "ID, name, capital, countryID" * C2_CountryTable -schema : "ID, name, capital, Region" If User queries the data on cube with a query like the following : * Select sales from SALES_CUBE where region = 'APAC' and time_range_in(09/01/2012, 12/31/2012) // Q4 -2012. Cube Abstraction provided would be smart enough to figure out which table to go and give the result . In this case the query translates to : * Select sales from C2_CountryFact_quarterly join C2_countryTable on C2_CountryFact_quarterly.CountryID = C2_countryTable.ID where dt = "Q4-2012" and C2_countryTable.region = 'APAC'; > Introduce cube abstraction in hive > ---------------------------------- > > Key: HIVE-4115 > URL: https://issues.apache.org/jira/browse/HIVE-4115 > Project: Hive > Issue Type: New Feature > Reporter: Amareshwari Sriramadasu > Assignee: Amareshwari Sriramadasu > > We would like to define a cube abstraction so that user can query at cube > layer and do not know anything about storage and rollups. > Will describe the model more in following comments. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira