lenmom opened a new issue #545: surrogate key gerneration requested. URL: https://github.com/apache/incubator-doris/issues/545 **scenario description:** supose we have a situation in **dim_sales_person(employee_id int , department int,...)**, and the fact table **fact_sales(order_id int, sales_person_id, department_id int,...)**. in case we calculate the detapartment sales amount based on the fact table **fact_sales**, if the sales person have changed department and we have no surrogate key gerneration mechanism,so all the sales metric for the sales person would be calculated to the persion's current department, and this is incorrect. a common solution for this scenario is to generate a new record for sales person who has changed department, and generate new record and insert to the **dim_sales_person** table, and the fact table would reference the new generated record's primary key. and set the previous record expire time to the day when the person's department has changed. the table schema look as follows: CREATE TABLE dim_sales_person ( sales_person_sk int , employee_id int , department varchar(128), . . . effective_date date, expiry_date date ) this is called slow changed dimention type 2, and we should record the changing attibute history. In hive, it has already support such feture, and the solution looks like as follows: -- 将有地址变化的插入到dim_customer表,如果有相同数据存在有不过期的数据则不插入 INSERT INTO dw_sales.dim_customer SELECT **row_number() over (ORDER BY t1.customer_number) + t2.sk_max**, t1.customer_number, t1.customer_name, t1.customer_street_address, t1.customer_zip_code, t1.customer_city, t1.customer_state, t1.version, t1.effective_date, t1.expiry_date FROM(SELECT t2.customer_number customer_number, t2.customer_name customer_name, t2.customer_street_address customer_street_address, t2.customer_zip_code, t2.customer_city, t2.customer_state, t1.version + 1 `version`, ${hivevar:pre_date} effective_date, ${hivevar:max_date} expiry_date FROM dw_sales.dim_customer t1 INNER JOIN ods_sales.ods_customer t2 ON t1.customer_number = t2.customer_number AND t1.expiry_date = ${hivevar:pre_date} LEFT JOIN dw_sales.dim_customer t3 ON t1.customer_number = t3.customer_number AND t3.expiry_date = ${hivevar:max_date} WHERE t1.customer_street_address <> t2.customer_street_address AND t3.customer_sk IS NULL ) t1 CROSS JOIN (**SELECT COALESCE(MAX(customer_sk),0) sk_max FROM dw_sales.dim_customer**) t2; the key implementation is: **row_number() over (ORDER BY t1.customer_number)** + **t2.sk_max** via such solution, hive can handle surrogate key gerneration. **In summery:** because slow change dimention is an important feature in dataware house modeling, so it's suggested to support surrogate key gerneration in doris.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
