I'm not sure if it will work, does hbase bulk load allow that?​

On Fri, Jan 15, 2016 at 2:28 PM, Yerui Sun <sunye...@gmail.com> wrote:

> hongbin,
>
> I understand how the number of reducers is determined, and it could be
> improved.
>
> Supposed that we got 100GB data after cuboid building, and with setting
> that 10GB per region. For now, 10 split keys was calculated, and 10 region
> created, 10 reducer used in ‘convert to hfile’ step.
>
> With optimization, we could calculate 100 (or more) split keys, and use
> all them in ‘covert to file’ step, but sampled 10 keys in them to create
> regions. The result is still 10 region created, but 100 reducer used in
> ‘convert to file’ step. Of course, the hfile created is also 100, and load
> 10 files per region. That’s should be fine, doesn’t affect the query
> performance dramatically.
>
> > 在 2016年1月15日,13:53,hongbin ma <mahong...@apache.org> 写道:
> >
> > hi, yerui,
> >
> > the reason why the number of "convert to hfile" reducers is small is
> > because each region's output will become a htable region. Too many
> regions
> > will be a burden to hbase cluster. In our production env we have cubes
> that
> > are 10T+, guess how many regions will it populate?
> >
> > What's more Kylin provides different profiles to control the expected
> > region size (thus controlling the number of regions & parallelism of
> > "create htable" reducer), you can modify it depending on your cube size.
> In
> > 2.x it's basically 10G for small cubes, 20G for medium cubes and 100G.
> > However this is a manual work when creating cube, and I admit the value
> > settings for the three profiles is still discussable.
> >
> >
> >
> >
> > On Fri, Jan 15, 2016 at 11:29 AM, Yerui Sun <sunye...@gmail.com> wrote:
> >
> >> Agreed with 梁猛.
> >>
> >> Actually we found the same issue, the number of reducers is too small in
> >> step ‘convert to hfile’, which is same as the region count.
> >>
> >> I think we could increase the number of reducers, to improve
> performance.
> >> If anyone has interesting in this, we could discuss more about the
> solution.
> >>
> >>> 在 2016年1月15日,09:46,13802880...@139.com 写道:
> >>>
> >>> actually,I found the last step " convert to hfile"  take too much time,
> >> more than 40 minutes for single region(use small, and result file about
> 5GB)
> >>>
> >>>
> >>>
> >>> 中国移动广东有限公司 网管中心 梁猛
> >>> 13802880...@139.com
> >>>
> >>> From: ShaoFeng Shi
> >>> Date: 2016-01-15 09:40
> >>> To: dev
> >>> Subject: Re: beg suggestions to speed up the Kylin cube build
> >>> The cube build performance is much determined by your Hadoop cluster's
> >>> capacity. You can do some inspection with the MR job's statistics to
> >>> analysis the potential bottlenecks.
> >>>
> >>>
> >>>
> >>> 2016-01-15 7:19 GMT+08:00 zhong zhang <zzaco...@gmail.com>:
> >>>
> >>>> Hi All,
> >>>>
> >>>> We are trying to build a nine-dimension cube:
> >>>> eight mandatory dimensions and one hierarchy
> >>>> dimension. The fact table is like 20G. Two lookup
> >>>> tables are 1.3M and 357k separately. It takes like
> >>>> 3 hours to go to 30% progress which is kind of slow.
> >>>>
> >>>> We'd like to know are there suggestions to speed up
> >>>> the Kylin cube build. We got a suggestion from
> >>>> a slide said that sort the dimension based on the
> >>>> cardinality. Are there any other ways we can try?
> >>>>
> >>>> We also noticed that only half of the memory and
> >>>> half of the CPU are used during the cube build.
> >>>> Are there any ways to fully utilize the resource?
> >>>>
> >>>> Looking forward to hear from you.
> >>>>
> >>>> Best regards,
> >>>> Zhong
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Best regards,
> >>>
> >>> Shaofeng Shi
> >>
> >>
> >
> >
> > --
> > Regards,
> >
> > *Bin Mahone | 马洪宾*
> > Apache Kylin: http://kylin.io
> > Github: https://github.com/binmahone
>
>


-- 
Regards,

*Bin Mahone | 马洪宾*
Apache Kylin: http://kylin.io
Github: https://github.com/binmahone

Reply via email to