Hi Gang, Kylin leverages hadoop to do the cube building, and leverage HBase to server the runtime query; Most of the computing happens in the hadoop and HBase cluster, both are scaleable; So, the performance very likely depends on the Cluster's size; Besides, some other aspects can impact the performance, like the cube complexity, dimension cardinality, etc; We couldn’t briefly answer yes or no to your question without the detail inputs;
Liang Meng’s case is very good; He listed the cluster size, cube dimensions, data cardinality, etc; Thanks Meng, this is a good reference for all users; (Next time if you can translate it in English, that will be great.) Please allow me to translate it: 30 Million records per day is a small case; Let me give your our case: 50 nodes; 6 Billion records per day; 5 lookup tables, 8 dimensions; One of the dimension has more than 10 million distinct values (cardinality); Other dimensions’ cardinality is from tens of thousands to hounds of thousands; Build one day’s data will take about 200 minutes; Most of the time was spent on: 1. Extracting data from hive (create the flat intermediate table): as we restrict hive to using < 10% capacity of the cluster, so it is slow, usually take about 1 hour; 2. Creating HFile, which takes about more than 1 hour; Other steps will take about 1 hour; On 8/12/15, 3:18 PM, "liangmeng" <[email protected]> wrote: >3000w太小case了,我给你一个我们的案例吧: >50节点 >每天60亿条 >5张维表,8个维度 >其中有一个维度数据是千万级的,其他维度都是几万到几十万级别 >跑一天数据大概200分钟吧; > >主要耗时在: >1、从hive表抽取数据,这一步因为我们限制了hive只能使用整个集群的10%资源,所以相对较慢,用了大 概1小时; >2、cube最后生成hbase的hfile,用了大概1个多小时 >其他的汇聚时间差不多也是1小时多点吧; > > > >梁猛 >中国移动广东公司 网管维护中心 网管支撑室 >电话:13802880779 >邮箱: [email protected] ,[email protected] >地址:广东省广州市珠江新城珠江西路11号 广东全球通大厦北3楼 >邮编:510623 > >发件人: 李刚 >发送时间: 2015-08-12 14:19 >收件人: dev >主题: kylin 性能问题 > >你好 你们有测试过kylin的性能吗?我们有每天3000w条记录,需要进行合并,生成魔方,供前端查询使用, 生成完数据时间应该不能很长,大概在3小时 >内,请问kylin 能胜任吗?你们实测的记录是什么样的?
