Hi Shaofeng, I'm not sure but if create a view with adding a new column which combined with the three column can simplify the problem?
Thank you ________________________________ 发件人: ShaoFeng Shi <[email protected]> 发送时间: 2018年5月16日 20:00 收件人: dev 抄送: user 主题: Re: Partition Date Issue Hi Debdutto, To match different partition policy, Kylin has a "IPartitionConditionBuilder" interface. And there is exactly an implementation for tripple column "YEAR", "MONTH", "DAY", please check: https://github.com/apache/kylin/blob/master/core-metadata/src/main/java/org/apache/kylin/metadata/model/PartitionDesc.java#L301 The implementation will concat the three columns and then compare it with the given dates, for example: CONCAT(FACT.YEAR, FACT.MONTH, FACT.DAY) >= '2018-01-01' AND CONCAT(FACT.YEAR, FACT.MONTH, FACT.DAY) < '2018-01-02' While on Kylin UI there is no widge to enable this builder. You need to manually modify the metadata of the Data Model with "bin/metastore.sh" tool. And then change the "partition_condition_builder", for example: "partition_desc" : { "partition_date_column" : "KYLIN_SALES.PART_DT", "partition_time_column" : null, "partition_date_start" : 1325376000000, "partition_date_format" : "yyyy-MM-dd", "partition_time_format" : "HH:mm:ss", "partition_type" : "APPEND", "partition_condition_builder" : "org.apache.kylin.metadata.model.PartitionDesc$YearMonthDayPartitionConditionBuilder" } 2018-05-15 21:42 GMT+08:00 Debdutto Chakraborty <[email protected]>: > Hi, > > So, we have a hive table with analytical events data (impressions, clicks, > conversions and such). A typical day produces around 50 to 100 million rows > in this table with around 30 columns. > > We were trying to move to Kylin and prepare cubes from the data which is in > this table. > > Now the problem is: > > 1. This hive table is partitioned on YEAR, MONTH, DAY columns. Which are > separate columns. > 2. Kylin does not accept such separate columns as "Partition Date > Column". > 3. Running Hive queries on non partitioned columns is a nightmare. > > > The only solution to this that I see is that give the user an option during > configuration to specify separate columns like this and then create the > query accordingly. > > My only concern is that if this will impact the cube's "Refresh Settings" > > Please let me know if this should be done. I'm open to do the development > and open a PR. > > Regards, > Debdutto Chakraborty > -- Best regards, Shaofeng Shi 史少锋
