I am building a cube with 1 million rows in Hive, where Hive table is in ORC format, partitioned by date and clustered into 8 buckets.
Cube has 12 dimensions and 2 measures and Cube build job takes around 5 minutes to complete. Extract fact table distinct columns step is taking around 3 minutes which is more compared to the time taken by other steps. The Map-reduce job is spawning 6 mappers and 10 reducers for the same. What are the factors affecting the time taken by this step? I am using Kylin version 1.5.2 with default settings.
