Hi shaofeng, hongbin, for case 1: there should be some weakness, suppose that we have a "STAR" model schema:
lookup_table1: region_id, region_name, province_id, province_name, city_id, city_name fact_table1: region_id, province_id, city_id, M1, M2, Mn fact_table2: region_id, year_id, M1, M2,..., Mn fact_table3: region_id, province_id, quarter_id, M1, M2, Mn as mentioned, only the "fact_table1" can successfully join with "lookup_table1", "fact_table2" and "fact_table3" cannot share "lookup_table1", is it right? for case 2: sorry for the misunderstanding, what I mean actually is that if the VALUE of the city_name for some rows is changed, we will have to totally built the cube again, is it right? can we scan all the id and values in the fact table and build a hashmap, and before kylin returns the data to the user end, we only do the JOIN with the LITTLE amount of the qualified query data with the hashmap, and the reason why I think we should support similar feature is that for a small hadoop cluster and HUGE amount of data, it is VERY time-consuming to do the HIVE join when the first cube building step! 2015-06-17 17:27 GMT+08:00 Shi, Shaofeng <[email protected]>: > Hi dong, ³Derived² is only for lookup table (and the lookup table need > ensure the uniqueness of PK, so with one PK we can exactly find one value > for the derived column); > > For case 1, Kylin will not support derive from fact table; The problem is, > how can we know/ensure 1 region_id only has only 1 region_name mapped if > you repeats region_id and region_name in each row of fact table? If there > are more than 1 mapping, which one should be taken? > > If a table column name be changed, the cubes need be updated to use the > new name, and a re-build is needed; So you need avoid to change > table/column name; (I guess you¹re not mean this, is it?) > > On 6/17/15, 2:50 PM, "dong wang" <[email protected]> wrote: > > >usually, we may have 2 common cases: > >1, only have a fact table which has already been joined with lookup > >tables, > >take it as a wide table > > > >2, we have star schema data model, and when defining the cube, we define > >the join with the fact table and lookup tables, > > > >currently, > > > >to the 1st case, we cannot create derived dimensions based on the fact > >table which can be created only from the lookup tables, for example, we > >have fact table: region_id, region_name, province_id, province_name, > >city_id, city_name, M1, M2, ...Mn, we want to create a hierarchy > >dimension: > >region_id->province_id->city_id, and a derived dimenison: (region_name, > >province_name, city_name) as well, if we can achieve it, there will be no > >complicated joins at all, and also, the join will take much time in the > >first step when building the cube~ thus, will we support such demension > >design? > > > >to the 2nd case, even though we make the join, however, if the column > >names > >are changed for some reasons, we will have to build the whole cube again > >instead of only update the dimension data? > >
