Approach 1:

Dimension
Hierarhy 1:  region_id -> province_id -> city_id  (NOTE: all of these 3
columns come from fact table: ft)
Derived 1: region_id, region_name (NOTE: it comes from lookup table: lk1)
Derived 2: province_id, province_name(NOTE: it comes from lookup table: lk1)
Derived 3: city_id, city_name(NOTE: it comes from lookup table: lk1)

Approach 2
Dimension
Hierarhy 1:  region_id -> province_id -> city_id  (NOTE: all of these 3
columns come from fact table: ft)
Derive 1: (region_name, province_name, city_name)  //all is in one derived
dimension, and which comes from lookup table: lk1

I am not sure which of the above approaches is the correct way? please help
to confirm?

2015-06-16 11:12 GMT+08:00 Shi, Shaofeng <[email protected]>:

> Yes it is expected; The ³derived² columns are those can be derived by fks;
> So in the cube we only keep the *_ids in dimensions; Kylin will take
> snapshot for the lookup table and load them into memory, in run-time it
> can map the ids to the names;
>
>
> On 6/15/15, 4:34 PM, "dong wang" <[email protected]> wrote:
>
> >====================Hive tables========================
> >fact table: ft
> >region_id
> >province_id
> >city_id
> >
> >lookup table: lk1
> >region_id
> >region_name
> >province_id
> >province_name
> >city_id
> >city_name
> >
> >
> >====================Cube Design========================
> >
> >JOIN:   inner
> >
> >ft.region_id=lk1.region_id
> >and ft.province_id=lk1.province_id
> >and ft.city_id=lk1.city_id
> >
> >
> >Dimension
> >
> >Hierarhy 1:  region_id -> province_id -> city_id  (NOTE: all of these 3
> >columns come from fact table: ft)
> >Derived 1: region_id, region_name (NOTE: it comes from lookup table: lk1)
> >Derived 2: province_id, province_name(NOTE: it comes from lookup table:
> >lk1)
> >Derived 3: city_id, city_name(NOTE: it comes from lookup table: lk1)
> >
> >
> >is it correct to create the derived dimensions?
> >
> >
> >====================Cube Building========================
> >Build the cube:
> >
> >when the first step, the log says as below:
> >
> >CREATE EXTERNAL TABLE IF NOT EXISTS
> >kylin_intermediate_test_cube_1_20150101000000_20150601000000_635c985b_3212
> >_4dd5_9ffa_f100e4c323e1
> >(
> >test_cube_1_DT string
> >,test_cube_1_region_id int
> >,test_cube_1_province_id int
> >,test_cube_1_city_id int
> >,test_cube_1_m1 bigint
> >,test_cube_1_m2 bigint
> >)
> >ROW FORMAT DELIMITED FIELDS TERMINATED BY '\177'
> >STORED AS SEQUENCEFILE
> >
> >
> >we can see that "region_name", "province_name", "city_name" don't appear
> >in
> >the intermediate hive table created in the 1st step, instead, it only
> >contains "*_id" columns as listed above,  is it expected?
>
>

Reply via email to