Re: will we support derived dimensions from fact table itself

Shi, Shaofeng Wed, 17 Jun 2015 07:04:45 -0700

The design of lookup_table1 is problematic; you couldn't join it with
table like fact_table2, fact_table3 only on region_id; let’s see an
example:


lookup_table1 (assume it has two records with region_id R1):
region_id, region_name, province_id, province_name, city_id, city_name
R1, EAST, P1, ANHUI, C1, HEFEI

R1, EAST, P2, JIANGSU, C2, NANJING

fact_table2 (assume it only has 1 record, region_id is R1):
region_id, year_id, M1, M2,..., Mn
R1, 2015, m1, m2…


Join fact_table2 and lookup_table1 on region_id, you will get two records :
R1, EAST, 2015, m1, m2…

R1, EAST, 2015, m1, m2…


This is wrong, as the measures are repeated;

A better design might be, having lookup table for each layer:

Region:
region_id, region_name

Province:
province_id, province_name

City:
city_id, city_name

Then all the names can be derived from the id on fact table;

Just some my cents, let me know if I’m wrong...

On 6/17/15, 9:39 PM, "dong wang" <[email protected]> wrote:

>Hi shaofeng, hongbin,
>
>for case 1:  there should be some weakness, suppose that we have a "STAR"
>model schema:
>
>lookup_table1:
>region_id, region_name, province_id, province_name, city_id, city_name
>
>fact_table1:
>region_id, province_id, city_id, M1, M2, Mn
>
>fact_table2:
>region_id, year_id, M1, M2,..., Mn
>
>fact_table3:
>region_id, province_id, quarter_id, M1, M2, Mn
>
>as mentioned, only the "fact_table1" can successfully join with
>"lookup_table1",  "fact_table2" and "fact_table3" cannot share
>"lookup_table1",  is it right?
>
>
>
>for case 2:
>sorry for the misunderstanding, what I mean actually is that if the VALUE
>of the city_name for some rows is changed, we will have to totally built
>the cube again, is it right?  can we scan all the id and values in the
>fact
>table and build a hashmap, and before kylin returns the data to the user
>end, we only do the JOIN with the LITTLE amount of the qualified query
>data
>with the hashmap,  and the reason why I think we should support similar
>feature is that for a small hadoop cluster and HUGE amount of data, it is
>VERY time-consuming to do the HIVE join when the first cube building step!
>
>2015-06-17 17:27 GMT+08:00 Shi, Shaofeng <[email protected]>:
>
>> Hi dong, ³Derived² is only for lookup table (and the lookup table need
>> ensure the uniqueness of PK, so with one PK we can exactly find one
>>value
>> for the derived column);
>>
>> For case 1, Kylin will not support derive from fact table; The problem
>>is,
>> how can we know/ensure 1 region_id only has only 1 region_name mapped if
>> you repeats region_id and region_name in each row of fact table? If
>>there
>> are more than 1 mapping, which one should be taken?
>>
>> If a table column name be changed, the cubes need be updated to use the
>> new name, and a re-build is needed; So you need avoid to change
>> table/column name; (I guess you¹re not mean this, is it?)
>>
>> On 6/17/15, 2:50 PM, "dong wang" <[email protected]> wrote:
>>
>> >usually, we may have 2 common cases:
>> >1, only have a fact table which has already been joined with lookup
>> >tables,
>> >take it as a wide table
>> >
>> >2, we have star schema data model, and when defining the cube, we
>>define
>> >the join with the fact table and lookup tables,
>> >
>> >currently,
>> >
>> >to the 1st case, we cannot create derived dimensions based on the fact
>> >table which can be created only from the lookup tables,  for example,
>>we
>> >have fact table: region_id, region_name, province_id, province_name,
>> >city_id, city_name, M1, M2, ...Mn, we want to create a hierarchy
>> >dimension:
>> >region_id->province_id->city_id, and a derived dimenison: (region_name,
>> >province_name, city_name) as well, if we can achieve it, there will be
>>no
>> >complicated joins at all, and also, the join will take much time in the
>> >first step when building the cube~ thus, will we support such demension
>> >design?
>> >
>> >to the 2nd case, even though we make the join, however, if the column
>> >names
>> >are changed for some reasons, we will have to build the whole cube
>>again
>> >instead of only update the dimension data?
>>
>>

Re: will we support derived dimensions from fact table itself

Reply via email to