Hi Gang,

Kylin leverages hadoop to do the cube building, and leverage HBase to
server the runtime query; Most of the computing happens in the hadoop and
HBase cluster, both are scaleable; So, the performance very likely depends
on the Cluster's size; Besides, some other aspects can impact the
performance, like the cube complexity, dimension cardinality, etc; We
couldn’t briefly answer yes or no to your question without the detail
inputs;


Liang Meng’s case is very good; He listed the cluster size, cube
dimensions, data cardinality, etc; Thanks Meng, this is a good reference
for all users; (Next time if you can translate it in English, that will be
great.)

Please allow me to translate it:

30 Million records per day is a small case; Let me give your our case:
50 nodes;
6 Billion records per day;
5 lookup tables, 8 dimensions;
One of the dimension has more than 10 million distinct values
(cardinality); Other dimensions’ cardinality is from tens of thousands to
hounds of thousands;
Build one day’s data will take about 200 minutes;

Most of the time was spent on:
1. Extracting data from hive (create the flat intermediate table): as we
restrict hive to using < 10% capacity of the cluster, so it is slow,
usually take about 1 hour;
2. Creating HFile, which takes about more than 1 hour;
Other steps will take about 1 hour;


On 8/12/15, 3:18 PM, "liangmeng" <[email protected]> wrote:

>3000w太小case了,我给你一个我们的案例吧:
>50节点
>每天60亿条
>5张维表,8个维度
>其中有一个维度数据是千万级的,其他维度都是几万到几十万级别
>跑一天数据大概200分钟吧;
>
>主要耗时在:
>1、从hive表抽取数据,这一步因为我们限制了hive只能使用整个集群的10%资源,所以相对较慢,用了大
概1小时;
>2、cube最后生成hbase的hfile,用了大概1个多小时
>其他的汇聚时间差不多也是1小时多点吧;
>
>
>
>梁猛 
>中国移动广东公司 网管维护中心 网管支撑室
>电话:13802880779
>邮箱: [email protected][email protected]
>地址:广东省广州市珠江新城珠江西路11号 广东全球通大厦北3楼
>邮编:510623 
> 
>发件人: 李刚
>发送时间: 2015-08-12 14:19
>收件人: dev
>主题: kylin 性能问题
> 
>你好 你们有测试过kylin的性能吗?我们有每天3000w条记录,需要进行合并,生成魔方,供前端查询使用,
生成完数据时间应该不能很长,大概在3小时
>内,请问kylin 能胜任吗?你们实测的记录是什么样的?

Reply via email to