I guess you're running with 1.5.x which can build the cube with the "fast
cubing" algorithm: the cube build can be finished in one round of MR, and
if the data can be fit into memory the calculation is fast ;

While the step 2 ("fetch distinct values of dimensions") should be a more
light-weighted step comparing with the cube build, in my experience it
should take less time than the cube build step; But as 1 million is such a
small data set, one or two rounds of testing may not reflect the real case;
Suggest you make more tests, and using a bigger data set; Please share us
with your findings, thanks!

2016-05-31 15:10 GMT+08:00 Vaibhav Taro <[email protected]>:

> I am building a cube with 1 million rows in Hive, where Hive table is in
> ORC format, partitioned by date and clustered into 8 buckets.
>
> Cube has 12 dimensions and 2 measures and Cube build job takes around 5
> minutes to complete.
>
> Extract fact table distinct columns step is taking around 3 minutes which
> is more compared to the time taken by other steps. The Map-reduce job is
> spawning 6 mappers and 10 reducers for the same.
>
> What are the factors affecting the time taken by this step?
> I am using Kylin version 1.5.2 with default settings.
>



-- 
Best regards,

Shaofeng Shi

Reply via email to