Re: Extract Fact Table Distinct Columns step taking more time

Vaibhav Taro Thu, 02 Jun 2016 02:26:39 -0700

Hey, ShaoFeng thanks for the reply. Yes, my Kylin version is 1.5.2.

I have consistently observed that step-2 takes more time compared to other
steps in cube build process. For example with 1m, 5m, 10m and even with
498m records in Hive. For 498m records in Hive, Cube build process took 56
minutes and step-2 took around 16 minutes which was more than the time
taken by other steps in the cube build.


Is there any indicator in UI to know which cubing algorithm is being used
for the cube build job? I can see, in my cube build(1m) that first two
step, cube build step and convert cuboid data to Hfile step are the
map-reduce jobs. This job is using "fast cubing" as I can see in the logs,
however, statistics on the UI indicates that step-2 is taking more time.

Is there any other tuning that I can do to minimize the time taken by cube
build process? I can increase memory and parallelism.



On Wed, Jun 1, 2016 at 8:07 PM, ShaoFeng Shi <[email protected]> wrote:

> I guess you're running with 1.5.x which can build the cube with the "fast
> cubing" algorithm: the cube build can be finished in one round of MR, and
> if the data can be fit into memory the calculation is fast ;
>
> While the step 2 ("fetch distinct values of dimensions") should be a more
> light-weighted step comparing with the cube build, in my experience it
> should take less time than the cube build step; But as 1 million is such a
> small data set, one or two rounds of testing may not reflect the real case;
> Suggest you make more tests, and using a bigger data set; Please share us
> with your findings, thanks!
>
> 2016-05-31 15:10 GMT+08:00 Vaibhav Taro <[email protected]>:
>
> > I am building a cube with 1 million rows in Hive, where Hive table is in
> > ORC format, partitioned by date and clustered into 8 buckets.
> >
> > Cube has 12 dimensions and 2 measures and Cube build job takes around 5
> > minutes to complete.
> >
> > Extract fact table distinct columns step is taking around 3 minutes which
> > is more compared to the time taken by other steps. The Map-reduce job is
> > spawning 6 mappers and 10 reducers for the same.
> >
> > What are the factors affecting the time taken by this step?
> > I am using Kylin version 1.5.2 with default settings.
> >
>
>
>
> --
> Best regards,
>
> Shaofeng Shi
>



-- 
Regards,
VaibhaV

Re: Extract Fact Table Distinct Columns step taking more time

Reply via email to