if it works I'd love to see the change On Fri, Jan 15, 2016 at 3:35 PM, hongbin ma <mahong...@apache.org> wrote:
> I'm not sure if it will work, does hbase bulk load allow that? > > On Fri, Jan 15, 2016 at 2:28 PM, Yerui Sun <sunye...@gmail.com> wrote: > >> hongbin, >> >> I understand how the number of reducers is determined, and it could be >> improved. >> >> Supposed that we got 100GB data after cuboid building, and with setting >> that 10GB per region. For now, 10 split keys was calculated, and 10 region >> created, 10 reducer used in ‘convert to hfile’ step. >> >> With optimization, we could calculate 100 (or more) split keys, and use >> all them in ‘covert to file’ step, but sampled 10 keys in them to create >> regions. The result is still 10 region created, but 100 reducer used in >> ‘convert to file’ step. Of course, the hfile created is also 100, and load >> 10 files per region. That’s should be fine, doesn’t affect the query >> performance dramatically. >> >> > 在 2016年1月15日,13:53,hongbin ma <mahong...@apache.org> 写道: >> > >> > hi, yerui, >> > >> > the reason why the number of "convert to hfile" reducers is small is >> > because each region's output will become a htable region. Too many >> regions >> > will be a burden to hbase cluster. In our production env we have cubes >> that >> > are 10T+, guess how many regions will it populate? >> > >> > What's more Kylin provides different profiles to control the expected >> > region size (thus controlling the number of regions & parallelism of >> > "create htable" reducer), you can modify it depending on your cube >> size. In >> > 2.x it's basically 10G for small cubes, 20G for medium cubes and 100G. >> > However this is a manual work when creating cube, and I admit the value >> > settings for the three profiles is still discussable. >> > >> > >> > >> > >> > On Fri, Jan 15, 2016 at 11:29 AM, Yerui Sun <sunye...@gmail.com> wrote: >> > >> >> Agreed with 梁猛. >> >> >> >> Actually we found the same issue, the number of reducers is too small >> in >> >> step ‘convert to hfile’, which is same as the region count. >> >> >> >> I think we could increase the number of reducers, to improve >> performance. >> >> If anyone has interesting in this, we could discuss more about the >> solution. >> >> >> >>> 在 2016年1月15日,09:46,13802880...@139.com 写道: >> >>> >> >>> actually,I found the last step " convert to hfile" take too much >> time, >> >> more than 40 minutes for single region(use small, and result file >> about 5GB) >> >>> >> >>> >> >>> >> >>> 中国移动广东有限公司 网管中心 梁猛 >> >>> 13802880...@139.com >> >>> >> >>> From: ShaoFeng Shi >> >>> Date: 2016-01-15 09:40 >> >>> To: dev >> >>> Subject: Re: beg suggestions to speed up the Kylin cube build >> >>> The cube build performance is much determined by your Hadoop cluster's >> >>> capacity. You can do some inspection with the MR job's statistics to >> >>> analysis the potential bottlenecks. >> >>> >> >>> >> >>> >> >>> 2016-01-15 7:19 GMT+08:00 zhong zhang <zzaco...@gmail.com>: >> >>> >> >>>> Hi All, >> >>>> >> >>>> We are trying to build a nine-dimension cube: >> >>>> eight mandatory dimensions and one hierarchy >> >>>> dimension. The fact table is like 20G. Two lookup >> >>>> tables are 1.3M and 357k separately. It takes like >> >>>> 3 hours to go to 30% progress which is kind of slow. >> >>>> >> >>>> We'd like to know are there suggestions to speed up >> >>>> the Kylin cube build. We got a suggestion from >> >>>> a slide said that sort the dimension based on the >> >>>> cardinality. Are there any other ways we can try? >> >>>> >> >>>> We also noticed that only half of the memory and >> >>>> half of the CPU are used during the cube build. >> >>>> Are there any ways to fully utilize the resource? >> >>>> >> >>>> Looking forward to hear from you. >> >>>> >> >>>> Best regards, >> >>>> Zhong >> >>>> >> >>> >> >>> >> >>> >> >>> -- >> >>> Best regards, >> >>> >> >>> Shaofeng Shi >> >> >> >> >> > >> > >> > -- >> > Regards, >> > >> > *Bin Mahone | 马洪宾* >> > Apache Kylin: http://kylin.io >> > Github: https://github.com/binmahone >> >> > > > -- > Regards, > > *Bin Mahone | 马洪宾* > Apache Kylin: http://kylin.io > Github: https://github.com/binmahone > -- Regards, *Bin Mahone | 马洪宾* Apache Kylin: http://kylin.io Github: https://github.com/binmahone