Re: beg suggestions to speed up the Kylin cube build

ShaoFeng Shi Thu, 14 Jan 2016 23:58:49 -0800

For Meng's case, write 5GB takes 40 minutes, that's really slow. The
bottleneck should be on HDFS write (cuboid has been calculated, just
convert to HFile format in that step, no calculation and others).


2016-01-15 15:36 GMT+08:00 hongbin ma <mahong...@apache.org>:

> if it works I'd love to see the change
>
> On Fri, Jan 15, 2016 at 3:35 PM, hongbin ma <mahong...@apache.org> wrote:
>
> > I'm not sure if it will work, does hbase bulk load allow that?
> >
> > On Fri, Jan 15, 2016 at 2:28 PM, Yerui Sun <sunye...@gmail.com> wrote:
> >
> >> hongbin，
> >>
> >> I understand how the number of reducers is determined, and it could be
> >> improved.
> >>
> >> Supposed that we got 100GB data after cuboid building, and with setting
> >> that 10GB per region. For now, 10 split keys was calculated, and 10
> region
> >> created, 10 reducer used in ‘convert to hfile’ step.
> >>
> >> With optimization, we could calculate 100 (or more) split keys, and use
> >> all them in ‘covert to file’ step, but sampled 10 keys in them to create
> >> regions. The result is still 10 region created, but 100 reducer used in
> >> ‘convert to file’ step. Of course, the hfile created is also 100, and
> load
> >> 10 files per region. That’s should be fine, doesn’t affect the query
> >> performance dramatically.
> >>
> >> > 在 2016年1月15日，13:53，hongbin ma <mahong...@apache.org> 写道：
> >> >
> >> > hi, yerui,
> >> >
> >> > the reason why the number of "convert to hfile" reducers is small is
> >> > because each region's output will become a htable region. Too many
> >> regions
> >> > will be a burden to hbase cluster. In our production env we have cubes
> >> that
> >> > are 10T+, guess how many regions will it populate?
> >> >
> >> > What's more Kylin provides different profiles to control the expected
> >> > region size (thus controlling the number of regions & parallelism of
> >> > "create htable" reducer), you can modify it depending on your cube
> >> size. In
> >> > 2.x it's basically 10G for small cubes, 20G for medium cubes and 100G.
> >> > However this is a manual work when creating cube, and I admit the
> value
> >> > settings for the three profiles is still discussable.
> >> >
> >> >
> >> >
> >> >
> >> > On Fri, Jan 15, 2016 at 11:29 AM, Yerui Sun <sunye...@gmail.com>
> wrote:
> >> >
> >> >> Agreed with 梁猛.
> >> >>
> >> >> Actually we found the same issue, the number of reducers is too small
> >> in
> >> >> step ‘convert to hfile’, which is same as the region count.
> >> >>
> >> >> I think we could increase the number of reducers, to improve
> >> performance.
> >> >> If anyone has interesting in this, we could discuss more about the
> >> solution.
> >> >>
> >> >>> 在 2016年1月15日，09:46，13802880...@139.com 写道：
> >> >>>
> >> >>> actually，I found the last step " convert to hfile"  take too much
> >> time,
> >> >> more than 40 minutes for single region(use small, and result file
> >> about 5GB）
> >> >>>
> >> >>>
> >> >>>
> >> >>> 中国移动广东有限公司 网管中心 梁猛
> >> >>> 13802880...@139.com
> >> >>>
> >> >>> From: ShaoFeng Shi
> >> >>> Date: 2016-01-15 09:40
> >> >>> To: dev
> >> >>> Subject: Re: beg suggestions to speed up the Kylin cube build
> >> >>> The cube build performance is much determined by your Hadoop
> cluster's
> >> >>> capacity. You can do some inspection with the MR job's statistics to
> >> >>> analysis the potential bottlenecks.
> >> >>>
> >> >>>
> >> >>>
> >> >>> 2016-01-15 7:19 GMT+08:00 zhong zhang <zzaco...@gmail.com>:
> >> >>>
> >> >>>> Hi All,
> >> >>>>
> >> >>>> We are trying to build a nine-dimension cube:
> >> >>>> eight mandatory dimensions and one hierarchy
> >> >>>> dimension. The fact table is like 20G. Two lookup
> >> >>>> tables are 1.3M and 357k separately. It takes like
> >> >>>> 3 hours to go to 30% progress which is kind of slow.
> >> >>>>
> >> >>>> We'd like to know are there suggestions to speed up
> >> >>>> the Kylin cube build. We got a suggestion from
> >> >>>> a slide said that sort the dimension based on the
> >> >>>> cardinality. Are there any other ways we can try?
> >> >>>>
> >> >>>> We also noticed that only half of the memory and
> >> >>>> half of the CPU are used during the cube build.
> >> >>>> Are there any ways to fully utilize the resource?
> >> >>>>
> >> >>>> Looking forward to hear from you.
> >> >>>>
> >> >>>> Best regards,
> >> >>>> Zhong
> >> >>>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Best regards,
> >> >>>
> >> >>> Shaofeng Shi
> >> >>
> >> >>
> >> >
> >> >
> >> > --
> >> > Regards,
> >> >
> >> > *Bin Mahone | 马洪宾*
> >> > Apache Kylin: http://kylin.io
> >> > Github: https://github.com/binmahone
> >>
> >>
> >
> >
> > --
> > Regards,
> >
> > *Bin Mahone | 马洪宾*
> > Apache Kylin: http://kylin.io
> > Github: https://github.com/binmahone
> >
>
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>



-- 
Best regards,

Shaofeng Shi

Re: beg suggestions to speed up the Kylin cube build

Reply via email to