Re: beg suggestions to speed up the Kylin cube build

Yerui Sun Fri, 15 Jan 2016 02:27:07 -0800

I think the solution should be work, let’s open an issue and resolve it later: 
https://issues.apache.org/jira/browse/KYLIN-1323



> 在 2016年1月15日，15:57，ShaoFeng Shi <shaofeng...@apache.org> 写道：
> 
> For Meng's case, write 5GB takes 40 minutes, that's really slow. The
> bottleneck should be on HDFS write (cuboid has been calculated, just
> convert to HFile format in that step, no calculation and others).
> 
> 2016-01-15 15:36 GMT+08:00 hongbin ma <mahong...@apache.org>:
> 
>> if it works I'd love to see the change
>> 
>> On Fri, Jan 15, 2016 at 3:35 PM, hongbin ma <mahong...@apache.org> wrote:
>> 
>>> I'm not sure if it will work, does hbase bulk load allow that?
>>> 
>>> On Fri, Jan 15, 2016 at 2:28 PM, Yerui Sun <sunye...@gmail.com> wrote:
>>> 
>>>> hongbin，
>>>> 
>>>> I understand how the number of reducers is determined, and it could be
>>>> improved.
>>>> 
>>>> Supposed that we got 100GB data after cuboid building, and with setting
>>>> that 10GB per region. For now, 10 split keys was calculated, and 10
>> region
>>>> created, 10 reducer used in ‘convert to hfile’ step.
>>>> 
>>>> With optimization, we could calculate 100 (or more) split keys, and use
>>>> all them in ‘covert to file’ step, but sampled 10 keys in them to create
>>>> regions. The result is still 10 region created, but 100 reducer used in
>>>> ‘convert to file’ step. Of course, the hfile created is also 100, and
>> load
>>>> 10 files per region. That’s should be fine, doesn’t affect the query
>>>> performance dramatically.
>>>> 
>>>>> 在 2016年1月15日，13:53，hongbin ma <mahong...@apache.org> 写道：
>>>>> 
>>>>> hi, yerui,
>>>>> 
>>>>> the reason why the number of "convert to hfile" reducers is small is
>>>>> because each region's output will become a htable region. Too many
>>>> regions
>>>>> will be a burden to hbase cluster. In our production env we have cubes
>>>> that
>>>>> are 10T+, guess how many regions will it populate?
>>>>> 
>>>>> What's more Kylin provides different profiles to control the expected
>>>>> region size (thus controlling the number of regions & parallelism of
>>>>> "create htable" reducer), you can modify it depending on your cube
>>>> size. In
>>>>> 2.x it's basically 10G for small cubes, 20G for medium cubes and 100G.
>>>>> However this is a manual work when creating cube, and I admit the
>> value
>>>>> settings for the three profiles is still discussable.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Fri, Jan 15, 2016 at 11:29 AM, Yerui Sun <sunye...@gmail.com>
>> wrote:
>>>>> 
>>>>>> Agreed with 梁猛.
>>>>>> 
>>>>>> Actually we found the same issue, the number of reducers is too small
>>>> in
>>>>>> step ‘convert to hfile’, which is same as the region count.
>>>>>> 
>>>>>> I think we could increase the number of reducers, to improve
>>>> performance.
>>>>>> If anyone has interesting in this, we could discuss more about the
>>>> solution.
>>>>>> 
>>>>>>> 在 2016年1月15日，09:46，13802880...@139.com 写道：
>>>>>>> 
>>>>>>> actually，I found the last step " convert to hfile"  take too much
>>>> time,
>>>>>> more than 40 minutes for single region(use small, and result file
>>>> about 5GB）
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 中国移动广东有限公司 网管中心 梁猛
>>>>>>> 13802880...@139.com
>>>>>>> 
>>>>>>> From: ShaoFeng Shi
>>>>>>> Date: 2016-01-15 09:40
>>>>>>> To: dev
>>>>>>> Subject: Re: beg suggestions to speed up the Kylin cube build
>>>>>>> The cube build performance is much determined by your Hadoop
>> cluster's
>>>>>>> capacity. You can do some inspection with the MR job's statistics to
>>>>>>> analysis the potential bottlenecks.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 2016-01-15 7:19 GMT+08:00 zhong zhang <zzaco...@gmail.com>:
>>>>>>> 
>>>>>>>> Hi All,
>>>>>>>> 
>>>>>>>> We are trying to build a nine-dimension cube:
>>>>>>>> eight mandatory dimensions and one hierarchy
>>>>>>>> dimension. The fact table is like 20G. Two lookup
>>>>>>>> tables are 1.3M and 357k separately. It takes like
>>>>>>>> 3 hours to go to 30% progress which is kind of slow.
>>>>>>>> 
>>>>>>>> We'd like to know are there suggestions to speed up
>>>>>>>> the Kylin cube build. We got a suggestion from
>>>>>>>> a slide said that sort the dimension based on the
>>>>>>>> cardinality. Are there any other ways we can try?
>>>>>>>> 
>>>>>>>> We also noticed that only half of the memory and
>>>>>>>> half of the CPU are used during the cube build.
>>>>>>>> Are there any ways to fully utilize the resource?
>>>>>>>> 
>>>>>>>> Looking forward to hear from you.
>>>>>>>> 
>>>>>>>> Best regards,
>>>>>>>> Zhong
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Best regards,
>>>>>>> 
>>>>>>> Shaofeng Shi
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Regards,
>>>>> 
>>>>> *Bin Mahone | 马洪宾*
>>>>> Apache Kylin: http://kylin.io
>>>>> Github: https://github.com/binmahone
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Regards,
>>> 
>>> *Bin Mahone | 马洪宾*
>>> Apache Kylin: http://kylin.io
>>> Github: https://github.com/binmahone
>>> 
>> 
>> 
>> 
>> --
>> Regards,
>> 
>> *Bin Mahone | 马洪宾*
>> Apache Kylin: http://kylin.io
>> Github: https://github.com/binmahone
>> 
> 
> 
> 
> -- 
> Best regards,
> 
> Shaofeng Shi

Re: beg suggestions to speed up the Kylin cube build

Reply via email to