hi, yerui,

the reason why the number of "convert to hfile" reducers is small is
because each region's output will become a htable region. Too many regions
will be a burden to hbase cluster. In our production env we have cubes that
are 10T+, guess how many regions will it populate?

What's more Kylin provides different profiles to control the expected
region size (thus controlling the number of regions & parallelism of
"create htable" reducer), you can modify it depending on your cube size. In
2.x it's basically 10G for small cubes, 20G for medium cubes and 100G.
However this is a manual work when creating cube, and I admit the value
settings for the three profiles is still discussable.




On Fri, Jan 15, 2016 at 11:29 AM, Yerui Sun <sunye...@gmail.com> wrote:

> Agreed with 梁猛.
>
> Actually we found the same issue, the number of reducers is too small in
> step ‘convert to hfile’, which is same as the region count.
>
> I think we could increase the number of reducers, to improve performance.
> If anyone has interesting in this, we could discuss more about the solution.
>
> > 在 2016年1月15日,09:46,13802880...@139.com 写道:
> >
> > actually,I found the last step " convert to hfile"  take too much time,
> more than 40 minutes for single region(use small, and result file about 5GB)
> >
> >
> >
> > 中国移动广东有限公司 网管中心 梁猛
> > 13802880...@139.com
> >
> > From: ShaoFeng Shi
> > Date: 2016-01-15 09:40
> > To: dev
> > Subject: Re: beg suggestions to speed up the Kylin cube build
> > The cube build performance is much determined by your Hadoop cluster's
> > capacity. You can do some inspection with the MR job's statistics to
> > analysis the potential bottlenecks.
> >
> >
> >
> > 2016-01-15 7:19 GMT+08:00 zhong zhang <zzaco...@gmail.com>:
> >
> >> Hi All,
> >>
> >> We are trying to build a nine-dimension cube:
> >> eight mandatory dimensions and one hierarchy
> >> dimension. The fact table is like 20G. Two lookup
> >> tables are 1.3M and 357k separately. It takes like
> >> 3 hours to go to 30% progress which is kind of slow.
> >>
> >> We'd like to know are there suggestions to speed up
> >> the Kylin cube build. We got a suggestion from
> >> a slide said that sort the dimension based on the
> >> cardinality. Are there any other ways we can try?
> >>
> >> We also noticed that only half of the memory and
> >> half of the CPU are used during the cube build.
> >> Are there any ways to fully utilize the resource?
> >>
> >> Looking forward to hear from you.
> >>
> >> Best regards,
> >> Zhong
> >>
> >
> >
> >
> > --
> > Best regards,
> >
> > Shaofeng Shi
>
>


-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Reply via email to