I just realized that both approaches may not work as you expected;

The "derived" columns are derived from foreign keys, in your case the fk/pk
are composited keys: region_id + province_id + city_id; That's to say, only
when all these columns appeared in your query, Kylin can derive the names,
even if one region_id can determin the region name;

Only if you're sure all these FKs always appeared in SQL, you can use the
derived columns in this way; Otherwise, don't expect Kylin can smartly
conclude the region name with just region_id, as region_id is not PK, there
is no guarrantee on the uniqueness;



2015-06-16 12:00 GMT+08:00 dong wang <[email protected]>:

> Approach 1:
>
> Dimension
> Hierarhy 1:  region_id -> province_id -> city_id  (NOTE: all of these 3
> columns come from fact table: ft)
> Derived 1: region_id, region_name (NOTE: it comes from lookup table: lk1)
> Derived 2: province_id, province_name(NOTE: it comes from lookup table:
> lk1)
> Derived 3: city_id, city_name(NOTE: it comes from lookup table: lk1)
>
> Approach 2
> Dimension
> Hierarhy 1:  region_id -> province_id -> city_id  (NOTE: all of these 3
> columns come from fact table: ft)
> Derive 1: (region_name, province_name, city_name)  //all is in one derived
> dimension, and which comes from lookup table: lk1
>
> I am not sure which of the above approaches is the correct way? please help
> to confirm?
>
> 2015-06-16 11:12 GMT+08:00 Shi, Shaofeng <[email protected]>:
>
> > Yes it is expected; The ³derived² columns are those can be derived by
> fks;
> > So in the cube we only keep the *_ids in dimensions; Kylin will take
> > snapshot for the lookup table and load them into memory, in run-time it
> > can map the ids to the names;
> >
> >
> > On 6/15/15, 4:34 PM, "dong wang" <[email protected]> wrote:
> >
> > >====================Hive tables========================
> > >fact table: ft
> > >region_id
> > >province_id
> > >city_id
> > >
> > >lookup table: lk1
> > >region_id
> > >region_name
> > >province_id
> > >province_name
> > >city_id
> > >city_name
> > >
> > >
> > >====================Cube Design========================
> > >
> > >JOIN:   inner
> > >
> > >ft.region_id=lk1.region_id
> > >and ft.province_id=lk1.province_id
> > >and ft.city_id=lk1.city_id
> > >
> > >
> > >Dimension
> > >
> > >Hierarhy 1:  region_id -> province_id -> city_id  (NOTE: all of these 3
> > >columns come from fact table: ft)
> > >Derived 1: region_id, region_name (NOTE: it comes from lookup table:
> lk1)
> > >Derived 2: province_id, province_name(NOTE: it comes from lookup table:
> > >lk1)
> > >Derived 3: city_id, city_name(NOTE: it comes from lookup table: lk1)
> > >
> > >
> > >is it correct to create the derived dimensions?
> > >
> > >
> > >====================Cube Building========================
> > >Build the cube:
> > >
> > >when the first step, the log says as below:
> > >
> > >CREATE EXTERNAL TABLE IF NOT EXISTS
> >
> >kylin_intermediate_test_cube_1_20150101000000_20150601000000_635c985b_3212
> > >_4dd5_9ffa_f100e4c323e1
> > >(
> > >test_cube_1_DT string
> > >,test_cube_1_region_id int
> > >,test_cube_1_province_id int
> > >,test_cube_1_city_id int
> > >,test_cube_1_m1 bigint
> > >,test_cube_1_m2 bigint
> > >)
> > >ROW FORMAT DELIMITED FIELDS TERMINATED BY '\177'
> > >STORED AS SEQUENCEFILE
> > >
> > >
> > >we can see that "region_name", "province_name", "city_name" don't appear
> > >in
> > >the intermediate hive table created in the 1st step, instead, it only
> > >contains "*_id" columns as listed above,  is it expected?
> >
> >
>

Reply via email to