答复: Partition Date Issue

张佑铖 Mon, 21 May 2018 08:19:00 -0700

Hi Shaofeng,

I'm not sure but if create a view with adding a new column which combined with 
the three column can simplify the problem?



Thank you
________________________________
发件人: ShaoFeng Shi <[email protected]>
发送时间: 2018年5月16日 20:00
收件人: dev
抄送: user
主题: Re: Partition Date Issue

Hi Debdutto,

To match different partition policy, Kylin has a "IPartitionConditionBuilder"
interface. And there is exactly an implementation for tripple column
"YEAR", "MONTH", "DAY", please check:

https://github.com/apache/kylin/blob/master/core-metadata/src/main/java/org/apache/kylin/metadata/model/PartitionDesc.java#L301

The implementation will concat the three columns and then compare it with
the given dates, for example:

CONCAT(FACT.YEAR, FACT.MONTH, FACT.DAY) >= '2018-01-01' AND CONCAT(FACT.YEAR,
FACT.MONTH, FACT.DAY) < '2018-01-02'

While on Kylin UI there is no widge to enable this builder. You need to
manually modify the metadata of the Data Model with "bin/metastore.sh"
tool. And then change the "partition_condition_builder", for example:

"partition_desc" : {
  "partition_date_column" : "KYLIN_SALES.PART_DT",
  "partition_time_column" : null,
  "partition_date_start" : 1325376000000,
  "partition_date_format" : "yyyy-MM-dd",
  "partition_time_format" : "HH:mm:ss",
  "partition_type" : "APPEND",
  "partition_condition_builder" :
"org.apache.kylin.metadata.model.PartitionDesc$YearMonthDayPartitionConditionBuilder"
}






2018-05-15 21:42 GMT+08:00 Debdutto Chakraborty <[email protected]>:

> Hi,
>
> So, we have a hive table with analytical events data (impressions, clicks,
> conversions and such). A typical day produces around 50 to 100 million rows
> in this table with around 30 columns.
>
> We were trying to move to Kylin and prepare cubes from the data which is in
> this table.
>
> Now the problem is:
>
>    1. This hive table is partitioned on YEAR, MONTH, DAY columns. Which are
>    separate columns.
>    2. Kylin does not accept such separate columns as "Partition Date
>    Column".
>    3. Running Hive queries on non partitioned columns is a nightmare.
>
>
> The only solution to this that I see is that give the user an option during
> configuration to specify separate columns like this and then create the
> query accordingly.
>
> My only concern is that if this will impact the cube's "Refresh Settings"
>
> Please let me know if this should be done. I'm open to do the development
> and open a PR.
>
> Regards,
> Debdutto Chakraborty
>



--
Best regards,

Shaofeng Shi 史少锋

答复: Partition Date Issue

Reply via email to