Hi Chao Long,

Yes!
#
So I said “has provided”, below,
> At the same time,  Kylin should support the custom column for shard. (has 
> provided)

#
Bug, Kylin can insert one rand column in the intermediate hive table  for the 
next shard, (as default).

Best Wishes!

> 在 2018年11月2日,下午4:03,Chao Long <wayn...@qq.com> 写道:
> 
> Hi zhixin,
>   As I remember  If you set "shard by" column in cube design page, Kylin will 
> use this column as the condition of  "distribute by", rather than the first 
> three field of rowkey.
> 
> 
> 
> 
> ------------------ 原始邮件 ------------------
> 发件人: "liuzhixin"<liuz...@163.com>;
> 发送时间: 2018年11月2日(星期五) 下午3:11
> 收件人: "dev"<dev@kylin.apache.org>;
> 抄送: "Chao Long"<wayn...@qq.com>; 
> 主题: Re: Redistribute intermediate table default not by rand()
> 
> 
> 
> Hi Chao Long,
> 
> Thank you for the answer.
> #
> Step1: Create Intermediate Flat Hive Table
> Step2: Redistribute intermediate table
> #
> Perhaps, Kylin can insert one rand column in the intermediate hive table  for 
> the next shard, (as default).
> At the same time,  Kylin should support the custom column for shard. (has 
> provided)
> 
> Best Wishes.
> 
>> 在 2018年11月2日,下午1:38,Chao Long <wayn...@qq.com> 写道:
>> 
>> Hi zhixin,
>> Data may become not correct if use "distribute by rand()".
>> https://issues.apache.org/jira/browse/KYLIN-3388
>> 
>> 
>> 
>> 
>> ------------------ 原始邮件 ------------------
>> 发件人: "liuzhixin"<liuz...@163.com>;
>> 发送时间: 2018年11月2日(星期五) 中午12:53
>> 收件人: "dev"<dev@kylin.apache.org>;
>> 抄送: "ShaoFeng Shi"<shaofeng...@apache.org>; 
>> 主题: Re: Redistribute intermediate table default not by rand()
>> 
>> 
>> 
>> Hi kylin team:
>> 
>> Step: Redistribute intermediate table
>> #
>> 默认选择了维度的前三个字段作为DISTRIBUTE BY的依据,没有采用DISTRIBUTE BY RAND()
>> 如果没有合适的维度字段,这样的默认策略将会导致数据更加的数据不均衡。
>> 
>> Best Regards!
>> 
>>> 在 2018年11月2日,下午12:03,liuzhixin <liuz...@163.com> 写道:
>>> 
>>> Hi kylin team:
>>> 
>>> Version: Kylin2.5-hadoop3.1 for hdp3.0
>>> #
>>> Step: Redistribute intermediate table
>>> #
>>> DISTRIBUTE BY is that:
>>> INSERT OVERWRITE TABLE table_intermediate SELECT * FROM table_intermediate 
>>> DISTRIBUTE BY Field1, Field2, Field3;
>>> #
>>> Not DISTRIBUTE BY RAND()
>>> #
>>> Is this default DISTRIBUTE BY Field1, Field2, Field3? how to DISTRIBUTE BY 
>>> RAND()?
>>> 
>>> Best wishes.


Reply via email to