I'm not sure if it will work, does hbase bulk load allow that? On Fri, Jan 15, 2016 at 2:28 PM, Yerui Sun <sunye...@gmail.com> wrote:
> hongbin, > > I understand how the number of reducers is determined, and it could be > improved. > > Supposed that we got 100GB data after cuboid building, and with setting > that 10GB per region. For now, 10 split keys was calculated, and 10 region > created, 10 reducer used in ‘convert to hfile’ step. > > With optimization, we could calculate 100 (or more) split keys, and use > all them in ‘covert to file’ step, but sampled 10 keys in them to create > regions. The result is still 10 region created, but 100 reducer used in > ‘convert to file’ step. Of course, the hfile created is also 100, and load > 10 files per region. That’s should be fine, doesn’t affect the query > performance dramatically. > > > 在 2016年1月15日,13:53,hongbin ma <mahong...@apache.org> 写道: > > > > hi, yerui, > > > > the reason why the number of "convert to hfile" reducers is small is > > because each region's output will become a htable region. Too many > regions > > will be a burden to hbase cluster. In our production env we have cubes > that > > are 10T+, guess how many regions will it populate? > > > > What's more Kylin provides different profiles to control the expected > > region size (thus controlling the number of regions & parallelism of > > "create htable" reducer), you can modify it depending on your cube size. > In > > 2.x it's basically 10G for small cubes, 20G for medium cubes and 100G. > > However this is a manual work when creating cube, and I admit the value > > settings for the three profiles is still discussable. > > > > > > > > > > On Fri, Jan 15, 2016 at 11:29 AM, Yerui Sun <sunye...@gmail.com> wrote: > > > >> Agreed with 梁猛. > >> > >> Actually we found the same issue, the number of reducers is too small in > >> step ‘convert to hfile’, which is same as the region count. > >> > >> I think we could increase the number of reducers, to improve > performance. > >> If anyone has interesting in this, we could discuss more about the > solution. > >> > >>> 在 2016年1月15日,09:46,13802880...@139.com 写道: > >>> > >>> actually,I found the last step " convert to hfile" take too much time, > >> more than 40 minutes for single region(use small, and result file about > 5GB) > >>> > >>> > >>> > >>> 中国移动广东有限公司 网管中心 梁猛 > >>> 13802880...@139.com > >>> > >>> From: ShaoFeng Shi > >>> Date: 2016-01-15 09:40 > >>> To: dev > >>> Subject: Re: beg suggestions to speed up the Kylin cube build > >>> The cube build performance is much determined by your Hadoop cluster's > >>> capacity. You can do some inspection with the MR job's statistics to > >>> analysis the potential bottlenecks. > >>> > >>> > >>> > >>> 2016-01-15 7:19 GMT+08:00 zhong zhang <zzaco...@gmail.com>: > >>> > >>>> Hi All, > >>>> > >>>> We are trying to build a nine-dimension cube: > >>>> eight mandatory dimensions and one hierarchy > >>>> dimension. The fact table is like 20G. Two lookup > >>>> tables are 1.3M and 357k separately. It takes like > >>>> 3 hours to go to 30% progress which is kind of slow. > >>>> > >>>> We'd like to know are there suggestions to speed up > >>>> the Kylin cube build. We got a suggestion from > >>>> a slide said that sort the dimension based on the > >>>> cardinality. Are there any other ways we can try? > >>>> > >>>> We also noticed that only half of the memory and > >>>> half of the CPU are used during the cube build. > >>>> Are there any ways to fully utilize the resource? > >>>> > >>>> Looking forward to hear from you. > >>>> > >>>> Best regards, > >>>> Zhong > >>>> > >>> > >>> > >>> > >>> -- > >>> Best regards, > >>> > >>> Shaofeng Shi > >> > >> > > > > > > -- > > Regards, > > > > *Bin Mahone | 马洪宾* > > Apache Kylin: http://kylin.io > > Github: https://github.com/binmahone > > -- Regards, *Bin Mahone | 马洪宾* Apache Kylin: http://kylin.io Github: https://github.com/binmahone