Hi, you can observe the Spark generated cuboid file size in kylin's working
dir (for example:
/kylin/kylin_metadata/kylin-fd785bab-b875-4626-8bc3-7d46e8862d88/kylin_sales_cube/cuboid/level_base_cuboid/,
please replace the uuid and cube name with yours); If there are small files
(e.g several Mbs), you should increase this configuration to make the
partition bigger (e.g, 64 MB); Usually, this is needed when your cube has
some advanced measures like count distinct, topn, percentile etc, whose
size estimation is a little wild.

The situation got improved in v2.5.0, as we enhanced the size estimation
for those measures. With 2.5 you don't need to care much about it I think.

vishnuvardhanG <[email protected]> 于2018年9月28日周五 下午6:41写道:

> http://kylin.apache.org/docs20/tutorial/cube_spark.html
>
> In the above link there is mentioning about the affect of
> "kylin.engine.spark.rdd-partition-cut-mb" on cube building performance.
>
> how to decide  the optimum value of
> "kylin.engine.spark.rdd-partition-cut-mb" for cube creation?
>
>

-- 
Best regards,

Shaofeng Shi 史少锋

Reply via email to