Re: beg suggestions to speed up the Kylin cube build

hongbin ma Thu, 14 Jan 2016 23:37:29 -0800

if it works I'd love to see the change

On Fri, Jan 15, 2016 at 3:35 PM, hongbin ma <mahong...@apache.org> wrote:


> I'm not sure if it will work, does hbase bulk load allow that?
>
> On Fri, Jan 15, 2016 at 2:28 PM, Yerui Sun <sunye...@gmail.com> wrote:
>
>> hongbin，
>>
>> I understand how the number of reducers is determined, and it could be
>> improved.
>>
>> Supposed that we got 100GB data after cuboid building, and with setting
>> that 10GB per region. For now, 10 split keys was calculated, and 10 region
>> created, 10 reducer used in ‘convert to hfile’ step.
>>
>> With optimization, we could calculate 100 (or more) split keys, and use
>> all them in ‘covert to file’ step, but sampled 10 keys in them to create
>> regions. The result is still 10 region created, but 100 reducer used in
>> ‘convert to file’ step. Of course, the hfile created is also 100, and load
>> 10 files per region. That’s should be fine, doesn’t affect the query
>> performance dramatically.
>>
>> > 在 2016年1月15日，13:53，hongbin ma <mahong...@apache.org> 写道：
>> >
>> > hi, yerui,
>> >
>> > the reason why the number of "convert to hfile" reducers is small is
>> > because each region's output will become a htable region. Too many
>> regions
>> > will be a burden to hbase cluster. In our production env we have cubes
>> that
>> > are 10T+, guess how many regions will it populate?
>> >
>> > What's more Kylin provides different profiles to control the expected
>> > region size (thus controlling the number of regions & parallelism of
>> > "create htable" reducer), you can modify it depending on your cube
>> size. In
>> > 2.x it's basically 10G for small cubes, 20G for medium cubes and 100G.
>> > However this is a manual work when creating cube, and I admit the value
>> > settings for the three profiles is still discussable.
>> >
>> >
>> >
>> >
>> > On Fri, Jan 15, 2016 at 11:29 AM, Yerui Sun <sunye...@gmail.com> wrote:
>> >
>> >> Agreed with 梁猛.
>> >>
>> >> Actually we found the same issue, the number of reducers is too small
>> in
>> >> step ‘convert to hfile’, which is same as the region count.
>> >>
>> >> I think we could increase the number of reducers, to improve
>> performance.
>> >> If anyone has interesting in this, we could discuss more about the
>> solution.
>> >>
>> >>> 在 2016年1月15日，09:46，13802880...@139.com 写道：
>> >>>
>> >>> actually，I found the last step " convert to hfile"  take too much
>> time,
>> >> more than 40 minutes for single region(use small, and result file
>> about 5GB）
>> >>>
>> >>>
>> >>>
>> >>> 中国移动广东有限公司 网管中心 梁猛
>> >>> 13802880...@139.com
>> >>>
>> >>> From: ShaoFeng Shi
>> >>> Date: 2016-01-15 09:40
>> >>> To: dev
>> >>> Subject: Re: beg suggestions to speed up the Kylin cube build
>> >>> The cube build performance is much determined by your Hadoop cluster's
>> >>> capacity. You can do some inspection with the MR job's statistics to
>> >>> analysis the potential bottlenecks.
>> >>>
>> >>>
>> >>>
>> >>> 2016-01-15 7:19 GMT+08:00 zhong zhang <zzaco...@gmail.com>:
>> >>>
>> >>>> Hi All,
>> >>>>
>> >>>> We are trying to build a nine-dimension cube:
>> >>>> eight mandatory dimensions and one hierarchy
>> >>>> dimension. The fact table is like 20G. Two lookup
>> >>>> tables are 1.3M and 357k separately. It takes like
>> >>>> 3 hours to go to 30% progress which is kind of slow.
>> >>>>
>> >>>> We'd like to know are there suggestions to speed up
>> >>>> the Kylin cube build. We got a suggestion from
>> >>>> a slide said that sort the dimension based on the
>> >>>> cardinality. Are there any other ways we can try?
>> >>>>
>> >>>> We also noticed that only half of the memory and
>> >>>> half of the CPU are used during the cube build.
>> >>>> Are there any ways to fully utilize the resource?
>> >>>>
>> >>>> Looking forward to hear from you.
>> >>>>
>> >>>> Best regards,
>> >>>> Zhong
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> Best regards,
>> >>>
>> >>> Shaofeng Shi
>> >>
>> >>
>> >
>> >
>> > --
>> > Regards,
>> >
>> > *Bin Mahone | 马洪宾*
>> > Apache Kylin: http://kylin.io
>> > Github: https://github.com/binmahone
>>
>>
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>



-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Re: beg suggestions to speed up the Kylin cube build

Reply via email to