1. If you want to get derived columns in your query result, all PK/FK (related with that lookup table) need appear in the SELECT statement; Ofcourse it also must appear in the join parts, to match with the cube's definition;
2. If all PK appeared in SQL, approach 1 and 2 are the same (just let Kylin know those *_names can be derived); 3. Yest that is; In your case "region_id" is not the primary key, so don't use it alone to join with the lookup table; Come back to your original case, you can define all these column as a hierarchy: [region_id, region_name, province_id, province_name, city_id, city_name] Although it takes more storage space than the "derived" cube, there is no restriction on your query SQL; 2015-06-16 13:55 GMT+08:00 dong wang <[email protected]>: > Thanks shaofeng~ > 1, you mentioned that all the PK columns should be included in the > SQL/Query, does it mean that all the PK columns should be only in the > "JOIN" parts or be only in the "SELECT" parts, or either is OK as long as > it appears in the SQL statement? > > 2, suppose that all the PK columns are all in the SQL/Query based on the > answer of the 1st question above, which of the above approaches to create > derived dimension is the correct approach? !!! > > 3, as we know that Kylin wants user to have a STAR schema model, then take > the following as an example: > ====================Hive tables======================== > fact table: ft > region_id > m1 > m2 > > > lookup table: lk1 > region_id > region_name > province_id > province_name > city_id > city_name > > as we can see that the fact table ft only has on region_id, thus when > defining a cube with join, we can only join "region_id" on fact table 'ft' > and region_id on lookup talbe 'lk1', suppose the primary key of lk1 are > compound keys(region_id + province_id + city_id), as tested just now, we > cannot build such a cube successfully due to a "duplicate key" error since > only region_id cannot determine one unique record for the lookup table > 'lk1', is it related to what you mentioned above? > > 2015-06-16 13:31 GMT+08:00 ShaoFeng Shi <[email protected]>: > > > I just realized that both approaches may not work as you expected; > > > > The "derived" columns are derived from foreign keys, in your case the > fk/pk > > are composited keys: region_id + province_id + city_id; That's to say, > only > > when all these columns appeared in your query, Kylin can derive the > names, > > even if one region_id can determin the region name; > > > > Only if you're sure all these FKs always appeared in SQL, you can use the > > derived columns in this way; Otherwise, don't expect Kylin can smartly > > conclude the region name with just region_id, as region_id is not PK, > there > > is no guarrantee on the uniqueness; > > > > > > > > 2015-06-16 12:00 GMT+08:00 dong wang <[email protected]>: > > > > > Approach 1: > > > > > > Dimension > > > Hierarhy 1: region_id -> province_id -> city_id (NOTE: all of these 3 > > > columns come from fact table: ft) > > > Derived 1: region_id, region_name (NOTE: it comes from lookup table: > lk1) > > > Derived 2: province_id, province_name(NOTE: it comes from lookup table: > > > lk1) > > > Derived 3: city_id, city_name(NOTE: it comes from lookup table: lk1) > > > > > > Approach 2 > > > Dimension > > > Hierarhy 1: region_id -> province_id -> city_id (NOTE: all of these 3 > > > columns come from fact table: ft) > > > Derive 1: (region_name, province_name, city_name) //all is in one > > derived > > > dimension, and which comes from lookup table: lk1 > > > > > > I am not sure which of the above approaches is the correct way? please > > help > > > to confirm? > > > > > > 2015-06-16 11:12 GMT+08:00 Shi, Shaofeng <[email protected]>: > > > > > > > Yes it is expected; The ³derived² columns are those can be derived by > > > fks; > > > > So in the cube we only keep the *_ids in dimensions; Kylin will take > > > > snapshot for the lookup table and load them into memory, in run-time > it > > > > can map the ids to the names; > > > > > > > > > > > > On 6/15/15, 4:34 PM, "dong wang" <[email protected]> wrote: > > > > > > > > >====================Hive tables======================== > > > > >fact table: ft > > > > >region_id > > > > >province_id > > > > >city_id > > > > > > > > > >lookup table: lk1 > > > > >region_id > > > > >region_name > > > > >province_id > > > > >province_name > > > > >city_id > > > > >city_name > > > > > > > > > > > > > > >====================Cube Design======================== > > > > > > > > > >JOIN: inner > > > > > > > > > >ft.region_id=lk1.region_id > > > > >and ft.province_id=lk1.province_id > > > > >and ft.city_id=lk1.city_id > > > > > > > > > > > > > > >Dimension > > > > > > > > > >Hierarhy 1: region_id -> province_id -> city_id (NOTE: all of > these > > 3 > > > > >columns come from fact table: ft) > > > > >Derived 1: region_id, region_name (NOTE: it comes from lookup table: > > > lk1) > > > > >Derived 2: province_id, province_name(NOTE: it comes from lookup > > table: > > > > >lk1) > > > > >Derived 3: city_id, city_name(NOTE: it comes from lookup table: lk1) > > > > > > > > > > > > > > >is it correct to create the derived dimensions? > > > > > > > > > > > > > > >====================Cube Building======================== > > > > >Build the cube: > > > > > > > > > >when the first step, the log says as below: > > > > > > > > > >CREATE EXTERNAL TABLE IF NOT EXISTS > > > > > > > > > > >kylin_intermediate_test_cube_1_20150101000000_20150601000000_635c985b_3212 > > > > >_4dd5_9ffa_f100e4c323e1 > > > > >( > > > > >test_cube_1_DT string > > > > >,test_cube_1_region_id int > > > > >,test_cube_1_province_id int > > > > >,test_cube_1_city_id int > > > > >,test_cube_1_m1 bigint > > > > >,test_cube_1_m2 bigint > > > > >) > > > > >ROW FORMAT DELIMITED FIELDS TERMINATED BY '\177' > > > > >STORED AS SEQUENCEFILE > > > > > > > > > > > > > > >we can see that "region_name", "province_name", "city_name" don't > > appear > > > > >in > > > > >the intermediate hive table created in the 1st step, instead, it > only > > > > >contains "*_id" columns as listed above, is it expected? > > > > > > > > > > > > > >
