Approach 1: Dimension Hierarhy 1: region_id -> province_id -> city_id (NOTE: all of these 3 columns come from fact table: ft) Derived 1: region_id, region_name (NOTE: it comes from lookup table: lk1) Derived 2: province_id, province_name(NOTE: it comes from lookup table: lk1) Derived 3: city_id, city_name(NOTE: it comes from lookup table: lk1)
Approach 2 Dimension Hierarhy 1: region_id -> province_id -> city_id (NOTE: all of these 3 columns come from fact table: ft) Derive 1: (region_name, province_name, city_name) //all is in one derived dimension, and which comes from lookup table: lk1 I am not sure which of the above approaches is the correct way? please help to confirm? 2015-06-16 11:12 GMT+08:00 Shi, Shaofeng <[email protected]>: > Yes it is expected; The ³derived² columns are those can be derived by fks; > So in the cube we only keep the *_ids in dimensions; Kylin will take > snapshot for the lookup table and load them into memory, in run-time it > can map the ids to the names; > > > On 6/15/15, 4:34 PM, "dong wang" <[email protected]> wrote: > > >====================Hive tables======================== > >fact table: ft > >region_id > >province_id > >city_id > > > >lookup table: lk1 > >region_id > >region_name > >province_id > >province_name > >city_id > >city_name > > > > > >====================Cube Design======================== > > > >JOIN: inner > > > >ft.region_id=lk1.region_id > >and ft.province_id=lk1.province_id > >and ft.city_id=lk1.city_id > > > > > >Dimension > > > >Hierarhy 1: region_id -> province_id -> city_id (NOTE: all of these 3 > >columns come from fact table: ft) > >Derived 1: region_id, region_name (NOTE: it comes from lookup table: lk1) > >Derived 2: province_id, province_name(NOTE: it comes from lookup table: > >lk1) > >Derived 3: city_id, city_name(NOTE: it comes from lookup table: lk1) > > > > > >is it correct to create the derived dimensions? > > > > > >====================Cube Building======================== > >Build the cube: > > > >when the first step, the log says as below: > > > >CREATE EXTERNAL TABLE IF NOT EXISTS > >kylin_intermediate_test_cube_1_20150101000000_20150601000000_635c985b_3212 > >_4dd5_9ffa_f100e4c323e1 > >( > >test_cube_1_DT string > >,test_cube_1_region_id int > >,test_cube_1_province_id int > >,test_cube_1_city_id int > >,test_cube_1_m1 bigint > >,test_cube_1_m2 bigint > >) > >ROW FORMAT DELIMITED FIELDS TERMINATED BY '\177' > >STORED AS SEQUENCEFILE > > > > > >we can see that "region_name", "province_name", "city_name" don't appear > >in > >the intermediate hive table created in the 1st step, instead, it only > >contains "*_id" columns as listed above, is it expected? > >
