Need to identify why "step-2" is slow first. Maybe start by checking if the mapper splits are even.
On Thu, Jun 2, 2016 at 5:25 PM, Vaibhav Taro <vaibhavtar...@gmail.com> wrote: > Hey, ShaoFeng thanks for the reply. Yes, my Kylin version is 1.5.2. > > I have consistently observed that step-2 takes more time compared to other > steps in cube build process. For example with 1m, 5m, 10m and even with > 498m records in Hive. For 498m records in Hive, Cube build process took 56 > minutes and step-2 took around 16 minutes which was more than the time > taken by other steps in the cube build. > > Is there any indicator in UI to know which cubing algorithm is being used > for the cube build job? I can see, in my cube build(1m) that first two > step, cube build step and convert cuboid data to Hfile step are the > map-reduce jobs. This job is using "fast cubing" as I can see in the logs, > however, statistics on the UI indicates that step-2 is taking more time. > > Is there any other tuning that I can do to minimize the time taken by cube > build process? I can increase memory and parallelism. > > > > On Wed, Jun 1, 2016 at 8:07 PM, ShaoFeng Shi <shaofeng...@apache.org> > wrote: > > > I guess you're running with 1.5.x which can build the cube with the "fast > > cubing" algorithm: the cube build can be finished in one round of MR, and > > if the data can be fit into memory the calculation is fast ; > > > > While the step 2 ("fetch distinct values of dimensions") should be a more > > light-weighted step comparing with the cube build, in my experience it > > should take less time than the cube build step; But as 1 million is such > a > > small data set, one or two rounds of testing may not reflect the real > case; > > Suggest you make more tests, and using a bigger data set; Please share us > > with your findings, thanks! > > > > 2016-05-31 15:10 GMT+08:00 Vaibhav Taro <vaibhavtar...@gmail.com>: > > > > > I am building a cube with 1 million rows in Hive, where Hive table is > in > > > ORC format, partitioned by date and clustered into 8 buckets. > > > > > > Cube has 12 dimensions and 2 measures and Cube build job takes around 5 > > > minutes to complete. > > > > > > Extract fact table distinct columns step is taking around 3 minutes > which > > > is more compared to the time taken by other steps. The Map-reduce job > is > > > spawning 6 mappers and 10 reducers for the same. > > > > > > What are the factors affecting the time taken by this step? > > > I am using Kylin version 1.5.2 with default settings. > > > > > > > > > > > -- > > Best regards, > > > > Shaofeng Shi > > > > > > -- > Regards, > VaibhaV >