Re: Extract Fact Table Distinct Columns step taking more time

Li Yang Tue, 07 Jun 2016 03:08:00 -0700

Need to identify why "step-2" is slow first.

Maybe start by checking if the mapper splits are even.


On Thu, Jun 2, 2016 at 5:25 PM, Vaibhav Taro <vaibhavtar...@gmail.com>
wrote:

> Hey, ShaoFeng thanks for the reply. Yes, my Kylin version is 1.5.2.
>
> I have consistently observed that step-2 takes more time compared to other
> steps in cube build process. For example with 1m, 5m, 10m and even with
> 498m records in Hive. For 498m records in Hive, Cube build process took 56
> minutes and step-2 took around 16 minutes which was more than the time
> taken by other steps in the cube build.
>
> Is there any indicator in UI to know which cubing algorithm is being used
> for the cube build job? I can see, in my cube build(1m) that first two
> step, cube build step and convert cuboid data to Hfile step are the
> map-reduce jobs. This job is using "fast cubing" as I can see in the logs,
> however, statistics on the UI indicates that step-2 is taking more time.
>
> Is there any other tuning that I can do to minimize the time taken by cube
> build process? I can increase memory and parallelism.
>
>
>
> On Wed, Jun 1, 2016 at 8:07 PM, ShaoFeng Shi <shaofeng...@apache.org>
> wrote:
>
> > I guess you're running with 1.5.x which can build the cube with the "fast
> > cubing" algorithm: the cube build can be finished in one round of MR, and
> > if the data can be fit into memory the calculation is fast ;
> >
> > While the step 2 ("fetch distinct values of dimensions") should be a more
> > light-weighted step comparing with the cube build, in my experience it
> > should take less time than the cube build step; But as 1 million is such
> a
> > small data set, one or two rounds of testing may not reflect the real
> case;
> > Suggest you make more tests, and using a bigger data set; Please share us
> > with your findings, thanks!
> >
> > 2016-05-31 15:10 GMT+08:00 Vaibhav Taro <vaibhavtar...@gmail.com>:
> >
> > > I am building a cube with 1 million rows in Hive, where Hive table is
> in
> > > ORC format, partitioned by date and clustered into 8 buckets.
> > >
> > > Cube has 12 dimensions and 2 measures and Cube build job takes around 5
> > > minutes to complete.
> > >
> > > Extract fact table distinct columns step is taking around 3 minutes
> which
> > > is more compared to the time taken by other steps. The Map-reduce job
> is
> > > spawning 6 mappers and 10 reducers for the same.
> > >
> > > What are the factors affecting the time taken by this step?
> > > I am using Kylin version 1.5.2 with default settings.
> > >
> >
> >
> >
> > --
> > Best regards,
> >
> > Shaofeng Shi
> >
>
>
>
> --
> Regards,
> VaibhaV
>

Re: Extract Fact Table Distinct Columns step taking more time

Reply via email to